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II.  SPECTROGRAM  CORRELATION 


Introduction 


The  experimental  results  in  Volume  1 indicate  that  a spec- 
trogram correlation  process  (using  principal  component  analysis) 
is  an  excellent  target  classification  device.  The  use  of  a spectro- 
gram representation  is  natural  to  a bionic  sonar  receiver,  since 
the  spectrogram  is  an  idealized  model  of  the  neural  transduction 
process  in  the  mammalian  cochlea.  A spectrogram  representation, 
however,  does  not  necessarily  imply  the  use  of  a spectrogram 
correlator  as  a signal  detection  or  parameter  estimation  device. 

Section  2.2  considers  bionic  aspects  of  spectrogram  process- 
ing in  some  detail.  It  is  shown  in  Section  2.2  that  spectrogram  cor- 
relation is  in  fact  a locally  optimum  detection  device  when  the  data 
reaches  the  detector  in  the  form  of  a spectrogram.  The  utilization 
of  a spectrogram  correlator  by  human  listeners  implies  that  detec- 
tion of  a signal  with  multiple  harmonics  depends  upon  the  sum  of  the 
squared  energies  in  the  harmonics.  This  phenomenon  has  been 
observed  by  D.  M.  Green. 

Section  2.  3 explores  the  capabilities  of  a spectrogram  corre- 
lator as  a parameter  estimation  device.  The  predicted  standard 
deviation  of  a spectrogram  frequency  measurement  again  corresponds 
to  psychoacoustic  data.  The  standard  deviation  of  a delay  measure- 
ment is  also  predicted,  and  experiments  to  check  this  prediction 
are  suggested. 

Section  2.4  investigates  the  spectrogram  correlator  as  a 
detector  for  signals  that  have  been  passed  through  a time- varying, 
random  channel.  It  is  shown  that  the  expected  spectrogram  of  the 
channel  output  is  the  convolution  of  the  channel  scattering  function 
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with  the  spectrogram  of  the  input  signal.  In  the  case  of  low  energy 
coherence  (small  SNR  at  any  one  filter  output  at  any  one  time 
instant),  the  convolution  result  implies  that  a spectrogram  correlator 
is  equivalent  to  the  usual  Karhounen-  Lofcve  receiver  implementation. 
Furthermore,  the  spectrogram  correlator  can  be  modified  to  accom- 
modate non-Gaussian  noise  backgrounds  by  changing  the  power  law 
that  is  used  to  envelope  detect  the  responses  of  a bank  of  bandpass 
filters.  The  spectrogram  correlator  is  therefore  an  ideal  detector 
for  low  SNR  signals  that  have  been  passed  through  a random  chan- 
nel. An  example  of  such  a channel  is  a range  extended  sonar  target 
or  an  undersea  propagation  channel  that  involves  refraction  or 
reflection  from  surface  or  bottom. 


2.  2 Signal  Reconstruction  and  Detection  from  Spectrograms, 
with  Applications  to  Theories  of  Hearing  and  Animal 
Echolocation  

2.2.1  Introduction  to  Section  2.  2 
2.2.1.  A Brief  Summary  of  Section  2,  2 

Properties  of  spectrograms  and  related  signal  representa- 
tions are  listed,  and  some  new  properties  are  derived.  These 
properties  indicate  that  a signal  can  be  reconstructed  from  its 
spectrogram  except  for  the  time  invariant  part  of  the  signal's 
phase.  A detection  method  that  uses  the  spectrogram  itself  as  data 
is  compared  to  a detector  that  operates  on  the  reconstructed  signal. 
The  reconstruction  method  can  also  be  applied  to  hearing  aid  design 
and  to  signal  synthesis  for  auditory  experimentation. 

The  theoretical  results  are  used  to  interpret  neurophysi- 
ological properties  of  the  peripheral  mammalian  auditory  system, 
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and  to  predict  the  outcome  of  various  psychoacoustic  experiments. 
Matched  filtering  of  a signal  that  is  reconstructed  from  a spectro- 
gram may  be  relevant  to  animal  echolocation.  Detection  experi- 
ments with  humans,  however,  seem  to  favor  the  correlation  of  a 
data  spectrogram  with  a reference  spectrogram  that  is  either  pre- 
specified (spectrogram  correlator)  or  estimated  from  the  data 
itself  (spectrogram  estimator-cor relatfor ). 

2.2.  1 . B Background  and  Coals 

A spectrogram  provides  a convenient  signal  representation 
by  portraying  a waveform  as  a non-negative  function  of  instan- 
taneous frequency  and  time.  The  convenience  of  the  representation 
is  probably  related  to  the  fact  that  signals  are  converted  into  lines 
on  a plane,  or  into  a three-dimensional  surface  that  shows  intensity 
as  a function  of  time  and  frequency.  Since  these  visual  represen- 
tations are  similar  to  other  images  in  our  environment,  they  can 
be  easily  assimilated  by  a human  observer. 

There  is  evidence  that  spectrogram -like  signal  representa- 
tions may  be  formed  by  the  peripheral  auditory  system  (Evans, 

1975  and  1977;  Siebert,  1968).  To  be  sure,  some  of  the  elements 
in  the  biological  system  are  nonlinear -with-memory  (Kim,  Molnar, 
and  Pfeiffer,  1973),  probabilistic  (de  Boer,  1975;  de  Boer  and  de 
Jongh,  1978;  Siebert,  1970)  or  both,  but  we  should  nevertheless  be 
able  to  further  our  understanding  of  audition  by  analysis  of  an 
idealized  model  with  linear  filters  and  deterministic  envelope 
detectors. 
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Our  primary  motive  in  studying  such  a model  is  to  try  to 
answer  questions  such  as  the  following: 


1.  What  information  about  a signal  is  lost  when  a 
spectrogram  is  formed? 

2.  To  what  extent  can  we  reconstruct  a signal  from  its 
spectrogram? 

3.  Can  a data  spectrogram  and  a reference  spectro- 
gram be  compared  in  order  to  form  a nonideal 
detector?  What  is  the  best  operation  for  implement- 
ing such  a comparison?  How  much  worse  is  this  "best" 
nonideal  detector  than  an  ideal  detector? 

4.  Is  it  possible  to  construct  an  ideal  detector,  i.e.,  a 
matched  filter,  for  data  that  is  given  in  the  form  of 
a spectrogram?  Are  bats  and  dolphins  capable  of 
synthesizing  ideal  detectors  for  echolocation? 

5.  Man-made  spectrogram  synthesizers  do  not  usually 
incorporate  a great  amount  of  filter  overlap,  but  the 
auditory  spectrogram  (if  it  exists)  is  formed  from 
many  filters  with  transfer  functions  that  overlap 
very  closely.  What  is  the  reason  for  this  difference? 
Why  is  there  such  a high  density  of  hair  cells  along 
the  basilar  membrane  (Fletcher,  1953)  when  cochlear 
resonances  are  so  broad  (Pfeiffer  and  Kim,  1975)? 

6.  Can  fine  frequency  resolution  be  obtained  with  enve- 


lope detected  responses  of  overlapping  critical 
bandwidth  (approximately  1/3  octave  wide)  filters? 


Is  frequency  discrimination  capability  dependent 
upon  the  sharp,  high-frequency  cutoffs  that  are 
sometimes  observed  in  neural  tuning  curves? 

7.  How  can  a better  understanding  of  spectrograms  be 

applied  to  hearing  aid  design  or  to  future  experi- 
mental work  in  audition? 


2.2.  1 . C Outline  of  Section  2 . 2 


Section  2.  2 begins  with  some  definitions  and  properties  of 
spectrograms  and  related  signal  representations.  One  of  the  most 
important  properties  is  that  an  input  signal  can  be  reconstructed 
from  a noise-free  spectrogram,  except  for  a multiplicative  constant 
with  unit  magnitude  and  unknown  phase.  Another  property  indicates 
that  only  the  signal  autoambiguity  function  is  needed  for  matched 
filtering,  and  reconstruction  of  the  waveform  itself  is  thus  unneces- 
sary for  detection.  The  effect  of  noise  upon  the  signal  reconstruction 
process  is  investigated,  and  sampling  considerations  are  discussed. 

A "best"  nonideal  detector  is  derived  for  input  data  that  is  given  in 


the  form  of  a spectrogram.  The  performance  of  this  detector  is 
compared  with  that  of  a "best"  detector  that  uses  the  original  input 
data  rather  than  the  spectrogram. 

Section  2.2  goes  on  to  consider  some  ramifications  of  the 

spectrogram  as  a model  for  the  peripheral  auditory  system.  The 
existence  of  overlapping  filters,  monaural  phase  insensitivity,  and 
critical  bands  can  all  be  explained  in  terms  of  spectrogram  proper- 
ties. Rapid  high-frequency  cutoff  in  neural  tuning  curves  is  evalu- 
ated from  the  viewpoint  of  spectrogram  analysis.  Binaural  direction 
estimation  is  considered. 
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An  important  ramification  of  a spectrogram  representation 
is  that  matched  filtering  can  be  performed  upon  reconstructed  input 
data,  and  an  ideal  detector  can  theoretically  be  synthesized  in  an 
animal  echolocation  system.  A nonideal  process  using  spectrogram 
correlation  can  also  be  used,  but  such  a process  req-ires  5 dB  more 
signal  energy  to  achieve  the  same  performance  as  an  ideal  detector. 

The  possibility  of  data  reconstruction  and  subsequent  matched 
filtering  means  that  detection  experiments  can  be  realistically  analyzed 
with  a matched  filter  hypothesis  rather  than  with  an  energy  detector 
hypothesis.  For  a two-alternative,  forced  choice  experiment,  the 
theoretical  function  that  predicts  probability  of  a correct  response 
versus  signal  energy  does  not  have  as  steep  a slope  as  a best  fit  to 
the  data,  when  the  theoretical  curve  is  based  upon  an  energy  detector 
model.  A theoretical  curve  with  steeper  slope  can  be  obtained  from 
a matched  filter  model.  Masking  experiments  with  short  duration 
sinusoids  also  yield  results  that  partially  support  a matched  filter 
hypothesis. 

If  a spectrogram  is  an  acceptable  model  for  the  response  of 
the  peripheral  auditory  system,  then  the  fact  that  a signal  can  be 
almost  completely  reconstructed  from  a noise-free  spectrogram 
suggests  a new  method  of  hearing  aid  design  and  a new  signal  syn- 
thesis method  for  future  experiments  in  audition. 

2.  2.  2 Definitions  and  Properties  of  Time-Frequency  Energy 
Distributions,  Spectrograms,  and  Ambiguity  Functions 

A given  real  signal  u (t)  can  be  used  to  construct  a complex 
waveform  (Gabor,  1946;  Rihaczek,  1969;  Franks,  1969) 
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.1 


u(t)  = 2 2 [ur(t)  + j uf(t)  j 


(1) 


■where  u^t)  is  the  Hilbert  transform  (Erdelyi,  1954)  of  u (t).  The 
•waveform  u(t)  in  (1)  is  the  so-called  analytic  representation  of 
u^(t),  and  Re  ju(t)}  = 2 2 u^(t).  The  Fourier  transform  of  u(t)  is 


U(f) 


00 


(t)  exp  (- j2nft)  dt  . 


(2) 


-00 


, 


If  u(t)  is  analytic,  U(f)  is  identically  zero  for  f < 0.  The  factor  2 
in  (1)  is  included  to  preserve  signal  energy. 

The  envelope  of  u^(t)  is  usually  approximated  by  passing  a 
rectified  version  of  u^(t)  through  a low-pass  filter  or  integrator. 
An  exact  version  of  the  envelope,  however,  is 


|u(t)|  = 2 


2 ^ 2 
u (t)  + u (t) 
r r 


(3) 


which  is  the  magnitude  of  the  analytic  representation  of  u (t). 

r 

The  time-frequency  energy  density  function  of  u(t)  is  defined 
(Rihaczek,  1968)  as 


e^  (t,  f ) = u(t)  U*(f)  exp  (— j 2-rrft ) 


The  narrowband  ambiguity  function  of  u(t)  is  (Woodward,  1964) 

oo 

.*/ 


(4) 


Xuu(T'0)  = 


J u(t)  u* 

-oo 


(t  + T)  exp  (-j2n0t)  dt 


(5) 
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Similarly,  we  can  define  the  cross  -energy  density  function  and 
cross-ambiguity  function  by 


euv(U)  = u(t)  V (f)  exp  (-j2nft) 


00 

XUV(T,0)  = f u(t)  v’  (t  + t)  exp  ( - j 2 TI  0t)  dt  . 


(7) 


A spectrogram  can  be  obtained  either  by  bandpass  filtering 
or  by  Fourier  transformation  of  a time- windowed  signal.  In  the 
case  of  bandpass  filtering,  the  short-time  spectral  history  is 
(Ackroyd,  1971) 


OC 

/ 


Suv(tl'V  / U(f)  V(£- fj)  CXP  (i^rrftj ) df  1 (8) 


CO 

/ u(t  i 


- t)  v (t)  exp  <j 2 »rf  jt)  dt | 


(9) 


where  u(t)  and  v(t)  are  analytic.  Suy(tj,  f ^ )i s the  squared  envelope 


of  the  temporal  response  of  a filter  with  transfer  function  V(f  - f ), 
evaluated  at  time  t^. 

The  time  - frequency  energy  density  function  is  defined  so  as 
to  portray  the  variation  of  u(t)  in  the  time  and  frequency  domains  by 
means  of  a three-dimensional  plot  (Gabor,  1946).  The  spectrogram 
is  also  constructed  so  as  to  display  this  variation.  The  spectrogram 
is  obtained  by  passing  the  signal  u(t)  through  a bank  of  filters  that 
are  all  shifted  versions  of  the  same  baBic  function  V (f ) . 
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Historically,  the  ambiguity  function  was  motivated  by  the 
need  to  detect  a radar  signal  u(t)  in  Gaussian  noise,  and  to  simul- 
taneously estimate  its  delay  T and  Doppler  shift,  0.  If  VN  (f)  = 
U*(f).  the  filter  V*(f-0)  tests  the  hypothesis  that  the  transmitted 
signal  spectrum  U(f)  has  been  Doppler  shifted  by  0 Hz.  The  radar 
receiver  implements  a sequence  of  such  hypotheses  by  using  a bank 
of  filters  with  0 -dependent  transfer  functions.  When  u(t)  is  present 
at  the  input,  the  filter  responses  as  a function  of  range  (time)  and 
Doppler  shift  (frequency)  are  represented  by  the  ambiguity  function. 
This  interpretation  of  the  ambiguity  function  explains  its  close  con- 
nection to  time -frequency  signal  representations. 

Some  relevant  properties  of  time-frequency  energy  density 
functions,  spectrograms,  and  ambiguity  functions  are  listed  in 
Appendix  A.  These  properties  show  that  the  autoambiguity  function 
of  an  input  signal  can  be  obtained  from  a noise-free  spectrogram 
by  deconvolution.  A correlation  operation  can  then  be  implemented 
by  integrating  the  product  of  the  data  autoambiguity  function  and  the 
conjugated  autoambiguity  function  of  a reference  signal.  Alterna- 
tively, one  can  obtain  the  input  signal  itself  (apart  from  a complex 
constant)  from  the  data  autoambiguity  function. 

Several  practical  questions  should  be  considered  before 
the  deconvolution  and  detection  operations  are  implemented,  or 
comparisons  with  animal  audition  are  made.  Filters  with  variable 
bandwidth  should  be  considered  (Section  2.2,3),  and  the  required 
sampling  rate  for  deconvolution  should  be  examined  (Section  2.2.4) 
The  sampling  question  is  important  because  two  different  ’•ates 
apply  for  parsimonious  spectrogram  representation  and  for  decon- 
volution, and  because  the  higher  sampling  density  for  deconvolution 
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i 

i 


can  be  related  to  biological  data.  Finally,  the  idealized  signal 
reconstruction  process  (Section  2.  2.  5)  should  be  modified  to  take 
account  of  noise,  and  an  error  measure  (e.  g.  , mean-square  error 

j 

should  be  obtained  in  order  to  describe  the  effect  of  noise  upon  the 
reconstructed  signal  (Section  2.  2.  6). 

2.2.3  Variable  Bandwidth  Filters 

The  list  of  properties  in  Appendix  A pertains  to  a bank  of 
constant  bandwidth  filters  V(f-fj),  where  f^  is  varied.  Similar 
results  for  filters  with  bandwidths  that  depend  upon  fj  can  sometimes 
be  obtained  (Flaska,  1976),  but  the  derivations  are  more  difficult  and 
the  properties  are  often  not  as  simple  as  those  given  in  the  Appendix. 

It  is  desirable  to  exploit  the  elegance  and  simplicity  of  the 
constant  bandwidth  case,  but  the  analysis  should  also  be  sufficiently 
general  to  include  frequency  - dependent  passbands.  Fortunately, 
this  problem  has  a simple  solution.  Before  entering  the  filter  bank, 
received  spectra  can  be  distorted  by  a nonlinear  frequency  trans- 
formation that  effectively  converts  the  constant  bandwidth  filters  into 
frequency  variable  ones.  For  example,  if  proportional  bandwidth 
filtering  is  desired,  a given  signal  spectrum  G(f)  is  transformed  into 
a new  signal  U (f ) such  that 

U(f)  = exp  (f/2)  C.  [exp  (f)  ] . (10) 

2 

Since  IU(f)|  is  an  energy  spectral  density,  the  transformation  (10) 
should  be  energy  - invar  iant,  i.e.,  E = E , or 


This  condition  is  satisfied  when  the  factor  exp(f/2)  is  included  in 


U(f)  is  obtained  by  plotting  f*G(f)  on  a logarithmic  frequency 
scale.  Constant-bandwidth  filtering  of  U (f)  is  equivalent  to  propor- 


tional bandwidth  filtering  of  G(f).  Alternatively,  one  can  use 


1 _ 

U(f)  = [cosh  (f)]2  G [sinh  (£)] 


to  obtain  a more  realistic  model  of  mammalian  audition.  Constant 


bandwidth  filtering  of  U(f)  now  corresponds  to  constant  bandwidth 
filtering  of  G(f)  at  low  frequencies  and  proportional  bandwidth 
filtering  of  G(f)  at  higher  frequencies.  The  transformation  in  (12) 
was  suggested  at  a recent  conference  (Bullock,  1977). 

If  no  information  is  lost  in  a signal  processing  operation 


with  U(f)  as  the  input  spectrum,  then  the  original  data  G(f)  can  be 


obtained  from  a reconstructed  version  of  U(f). 


2.2.4  Sampling  the  Spectrogram 


2.  2.4.  A Minimum  Sampling  Rates  in  Time  and 
Frequency  Directions 


Consider  (8)  with  f fixed.  Fixing  f corresponds  to  looking 


at  a constant-fj  profile  of  the  spectrogram,  i.  e.,  at  the  squared 


envelope  of  the  time  response  of  the  filter  V(f-fj).  The  transfer 


function  V (f ) is  a low  pass  function  with  two-sided  bandwidth  2B  , 


i.  e.  , V (f ) = 0 for  If  I > B^.  For  an  analytic  or  one-sided  represen- 


tation, V(f)  = 0 for  f < 0 and  f > B^,  and  V(f  - f j ) * 0 for  f < f ^ 


and  f > f,  + B . 

1 v 


Let  x(t,  fj)  be  the  real  part  of  the  filter  response  when  the 


input  signal  is  u(t),  i.e.. 


I 
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x(t,  fj) 


= Re 


/' 


U(f)  V(f-  fj)  exp  (j 2 »rf t ) df 


(13) 


Assuming  that  u(t)  is  always  accompanied  by  a small  amount  of  white 

noise,  x(t,  f ) has  bandwidth  B , and  it  should  be  sampled  with  2B 
1 v v 

samples/sec.  The  analytic  representation  involves  a complex  time 
function,  and  to  completely  specify  the  filter  response,  we  need  to 
obtain  samples  of 


y(t,  fj ) 


I m 


U(f)  V(f  - ^ ) exp  (j27rft)  df 


I 

)' 


(14) 


The  real  function  y(t,  f ^ ) should  also  be  sampled  with  2B  samples/ 
sec.  A total  of  4B  real-valued  samples  per  second  or  2B  complex 

V v * 

samples  per  second  are  therefore  required. 


When  u(t)  is  a random  process,  it  can  be  shown  that  x(t,  f ^ ) 
and  y(t,  fj)  are  statistically  uncorrelated  random  variables 
(Appendix  C).  The  complex  samples  j^x(t,  f ^ ) + j y(t,  f )J  and 
[x(t  - k/2Bv>  f j ) + j y (t  - k/2Bv>  f j )J  are  also  uncorrelated  for 
any  integer,  nonzero  k (Slepian,  1954;  Cook  and  Bernfeld,  1967). 


The  squared  envelope  of  the  filter  output  is 


Suv(t,fl)  = [x2(t.  ft)  + y2(t.  fj)]  /2  . (15) 

The  squaring  operation  doubles  the  bandwidth  of  x(t,  fj)  and 

y(t,  fj).  The  sum  in  (15)  has  the  same  bandwidth  as  x^  or  y^,  and 

the  squared  envelope  can  therefore  be  represented  by  4B  real- 

v 

valued  samples  per  second. 
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A similar  argument  can  be  used  to  determine  the  minimum 

sampling  rate  for  a constant-t^  profile  of  S^ft^,  f j ) . It  must  be 

assumed,  however,  that  the  filter  impulse  response  v(t)  is  time 

limited  as  well  as  band  limited,  and  this  situation  cannot  occur. 

An  alternate  assumption  is  that  a function  is  adequately  specified  if 

it  can  be  reconstructed  to  within  a given  small  number  c,  and  that, 

for  any  filter  of  interest,  there  exist  sampling  densities  in  time  and 

frequency  such  that  C is  not  exceeded.  In  other  words,  we  assume 

the  existence  of  numbers  B and  T such  that  v(t)  is  adequately 

v v 

specified  by  samples  that  are  l/(2Bv>  apart  and  V (f ) is  adequately 

specified  by  samples  that  are  l/(2Tv)  apart.  By  has  been  called 

the  bandwidth  of  v(t),  and  will  be  called  the  duration  of  v(t). 

From  (9),  the  time  width  of  the  real  or  imaginary  parts  of  a 

constant-t  response  is  T , assuming  that  a small  amount  of  noise 
1 v 

is  always  added  to  u(t),  so  that  the  input  data  has  infinite  duration. 

Forming  the  squared  envelope  in  the  frequency  domain  leads  to  a 

constant-t^  spectrogram  profile  with  a time  width  of  2Tv>  and 

S (t.,  f)  should  be  sampled  at  a rate  of  at  least  4T  samples/Hz. 
uv  1 v 

In  summary,  the  spectrogram  can  be  represented  by  a grid 

of  samples  that  are  spaced  (4B  ) ^ seconds  apart  in  the  time  direc- 
. 1 v 

tion  and  (4Tv)  Hz  apart  in  the  frequency  direction.  In  a noise- 
free  situation,  (8)  and  (9)  imply  that  samples  can  be  spaced 
^4  min  (B  , B )J  * seconds  apart  in  time  and  ^4  min  (T^,  T^lJ  Hz 
apart  in  frequency. 

2.2.4.B  Sampling  Rates  for  Two-Dimensional 
Deconvolution 

Property  1 in  Appendix  A states  that  the  spectrogram  can  be 

formed  by  convolution  of  e (t,  f)  with  e (t,  f >.  From  the  definition 
’ uu  vv 


I 


. 


I 


ii 
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of  the  energy  density  function  (4),  e^(t,  f)  and  e^ft,  f)  can  be 

adequately  represented  by  using  a sampling  rate  in  time  that  is  at 

least  2 max  (B^,  B^)  samples/sec  and  a sampling  rate  in  frequency 

that  is  at  least  2 max  (T^,  Tv)  samples  Hz.  These  sampling  rates 

are  therefore  required  for  two-dimensional  digital  convolution  of 

e^ft,  f)  and  evv(t,  f).  The  output  of  the  convolution  process  will 

be  sampled  at  the  same  rates  as  above,  and  these  rates  will  often 

be  much  larger  than  the  minimum  required  sampling  rates  for  the 

spectrogram  (4Bv  samples/sec  and  4Tv  samples/Hz).  High  sampling 

rates  are  also  required  if  S^v(t,  f)  is  to  be  subjected  to  a digital 

deconvolution  operation  in  order  to  separate  e (t,  f)  from  e (t,  f). 

uu  vv 

The  estimate  of  e (t,  f ) after  deconvolution  may  be  under  sampled 
uu 

unless  the  spectrogram  is  sampled  at  2 maxfB^,  B^  ) samples/sec 

and  2 max  (T  , T ) samples/Hz. 
u v 

One  way  to  obtain  the  relatively  high  sampling  rates  that 
are  required  for  deconvolution  is  to  interpolate  between  the  more 
sparse  samples  that  are  required  for  efficient  representation  of 
the  spectrogram.  Another  method  is  to  construct  a spectrogram 
that  is  deliberately  over  - sampled.  Physically,  a given  frequency 
sample  is  obtained  by  using  a given  value  of  f^  in  (8),  i.e.,  by 
using  a filter  with  the  specified  center  frequency.  If  frequency 
interpolation  is  not  to  be  used,  the  filter  center  frequencies  should 
be  separated  by  at  most 

Af,  = \l  max  (T  , T )1_I  Hz  . (16) 

1 L u v J 

For  non-disper sive  filters, 

B = o-T  ' . (17) 

v v 
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where  a «»  1 , and  if  T » T , 

u v 

« Bv  • (18) 

T >>  T 
u v 

Since  each  filter  has  bandwidth  B , it  follows  from  (18) 

v ' 

that  the  filter  transfer  functions  must  overlap  in  order  to  retain 

information  about  e^(t,  f),  if  frequency  interpolation  is  not  to  be 

used.  The  required  degree  of  overlap  increases  as  the  signal 

duration  T becomes  longer, 
u 

A wideband  noise  process  (e.g.,  thermal  noise)  iB  usually 
added  to  the  signal,  u(t).  In  this  case,  a maximum  signal  band- 
width and  maximum  signal  duration  must  be  explicitly  imple- 
mented by  the  receiver,  since  the  data  (signal  plus  noise)  is  neither 
band  limited  nor  time  limited.  In  order  to  apply  sampling  ideas, 
the  receiver  must  introduce  a window  function  or  gate  in  the  time 
domain.  Most  transducers  experience  a gradual  decrease  in  sensi- 
tivity at  sufficiently  high  frequencies,  but  for  sampling  purposes  it 
may  be  advantageous  to  deliberately  introduce  a more  rapid  high 
frequency  cutoff  as  well  as  a time  window. 

2.  2.  5 Reconstruction  of  a Noise-Free  Signal  from  a 
Noise-Free  Spectrogram 

2. 2. 5. A Deconvolution 

Under  noise-free  conditions,  we  wish  to  reconstruct  the  signal 

u(t)  from  the  spectrogram  S (t_,  f , ) where  we  know  the  filter  func- 

uv  1 1 

tion  v(t).  From  Property  8 in  Appendix  A,  XUU(T»  can  be  obtained 
from  the  two-dimensional  Fourier  transform  of  the  spectrogram.  The 
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Fourier  transformed  spectrogram  is  divided  by  X^^(-  r,  0),  where 
,v(-T,  0)  can  be  formed  from  our  assumed  knowledge  of  v(t),  as 
in  (5).  Dividing  the  Fourier  transformed  spectrogram  by  X (-T,  0) 
yields  X^fT,  0).  From  Property  9 in  Appendix  A,  we  can  solve 
for  u(t)  by  using  (A14).  The  solution,  however,  will  always  be 
multiplied  by  a factor  exp(jX),  where  X is  an  unknown  constant. 

Under  ideal  conditions,  we  can  therefore  reconstruct  the  input 
signal  u(t)  from  its  spectrogram,  except  for  an  unknown,  complex 
factor  with  unity  magnitude. 

Multiplication  of  the  Fourier  transform  of  the  spectrogram 
by  [XVV('T»^)]  ^ is  a form  of  two-dimensional  filtering.  The  realiza- 
bility of  the  filter  X^(-T,  0)  * depends,  in  part,  upon  the  absence 

of  zeroes  in  the  function  X^(t,  0) . Many  filter  functions  v(t)  give 
rise  to  ambiguity  zeroes  on  the  T-0  plane.  One  function  that  doe6 
not  produce  ambiguity  zeroes  is  a Gaussian  pulse.  The  ambiguity 
function  of  such  a pulse  is  Gaussian  in  two  dimensions  (Cook  and 
Bernfeld,  1967). 

2 . 2 . 5 . B Sharpening-Up  the  "Smearing"  Function,  e„,r(t,  f) 

From  Property  9,  e^^ft,  f)  contains  information  about  the 

signal,  but  Property  1 says  that  e^(t,  f)  is  "smeared"  by  two- 

dimensional  convolution  with  e (t,  - f)  when  a spectrogram  is 

w 

formed. 

An  intuitively  appealing  approach  to  signal  reconstruction  is 
thus  to  sharpen  up  the  function  e ^(t,  f)  in  (Al). 

From  Property  11,  we  have 

1 euu<t,f)  * lXvv(t,f)|2-  <19> 
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The  operation  in  (19)  would  provide  an  accurate  version  of 
euu(t'f)  i£ 

|Xvv(t.f)|2  - 6(t)  6(f)  . (20) 

Unfortunately,  such  a condition  is  impossible  because 

|XVV,0'°»|2  = Ey  <21 

(Woodward,  1964;  Cook  and  Bernfeld,  1967),  where 

Ey  = / | v(t)|2  dt  . (22 

It  is  possible  to  obtain  a squared  ambiguity  function  with  a sharp 

2 ■ 1 2 . 

maximum  of  height  E att  = f=0,  but  X (t,  f ) I will  be  nonzero 

V I VV  I 

over  other  parts  of  the  t,  f plane  because  of  the  volume  invariance 
property 

//|Xvv(t-t,|2  dtdI  = Ev2  123 

(Woodward,  1964;  Cook  and  Bernfeld,  1967). 

Suppose  that  the  ambiguity  volume  that  is  not  present  in  the 
sharp  spike  at  t=f=0  is  evenly  distributed,  i.e., 

I X v(t,  f)  I = (a  narrow  spike  at  t = f = 0)  + 

+ (a  constant  for  all  t,  f)  . (24 

This  is  a so-called  thumbtack  ambiguity  function,  and  it  can  be 
approximated  with  signals  v(t)  that  have  large  time-bandwidth 


r 


1 


product.  If  |Xvv(t,  f)|2  has  the  form  (24),  then 

e (t,  f ) * lx  (t,f)|2  = (e  (t,  f)  * sharp  spike) 

uu  I vv  I uu 


+ (e  (t,f)  * a constant) 
uu 

« c e (t,  f)  + c E 
1 uu  2 u 


where  c^  and  c^  are  constants  and  E^  is  the  energy  of  the  signal, 
as  in  (A9).  From  (19) 


S (t,  f)  * e ( -t,  f)  - c E <»c  e (t,  f) 
uv  vv  2 u 1 uu 


(25) 


if  |Xvv(t,f)|^  is  described  by  (24). 

Eq.  (22)  states  that  the  time-frequency  energy  density  func- 
tion of  the  signal,  e ^(t,  f),  can  be  approximated  without  a deconvo- 
lution process  if  | X (t , f ) J ^ resembles  a thumbtack.  The  time- 
bandwidth  product  of  v(t)  must  be  large  in  order  to  obtain  a thumb- 
tack ambiguity  function,  and  the  operation  in  (25)  is  a two-dimensional 
pulse  compression  process. 

2.2.6  Estimation  of  a Signal  Waveform  from  Its 
Spectrogram  in  the  Presence  of  Noise 

The  problem  is  now  to  estimate  the  signal  from  a noise- 
corrupted  spectrogram.  From  Property  9,  a simpler  version  of 
the  problem  is  to  estimate  e^ft,  f)  or  X^fr,  4),  the  signal's  energy 
density  function  or  its  ambiguity  function.  The  spectrogram  can  be 
corrupted  by  noise  that  is  external  to  the  receiver  and  also  by  noise 
that  is  internal  to  the  receiver. 


; i 
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Figure  1 shows  the  formulation  of  the  problem  and  the 
assumed  form  of  its  solution.  The  data  consists  of  a signal  u(t) 
plus  a sample  function  of  noise  n(t),  and  the  data  is  represented  by 
its  energy  density  function  or  by  its  ambiguity  function,  which  is 
the  Fourier  transform  of  its  energy  density  function  (Property  3). 
According  to  Property  8 in  Appendix  A,  the  Fourier  transform  of 
the  data  spectrogram  is  obtained  by  forming  the  product 
(T,  0)  Xvv(-T,  0),  where  X^T,  0)  is  the  ambiguity  function  of  the 
filter  impulse  response,  v(t).  The  external  noise  n(t)  is  character- 
ized by  the  function 

Ne(r,0)  = X^T,  0)  + Xun(T'0)  + Xnu(T’  0)  (26) 


and 


Xu+n,  u+n(T*  0) 


Xuu(T'  0)  + Ne  (T'  0)  - 


(27) 


In  addition  to  the  external  noise  n(t).  Figure  1 shows  a 
second,  independent  noise  source  N^(T,  0)  that  is  added  to  the  Fourier 
transform  of  the  data  spectrogram.  N^r,  0)  represents  a sample 
function  of  the  internal  noise,  which  can  be  associated,  for  example, 
with  quantization  errors  in  the  representation  of  the  spectrogram. 

An  estimating  filter  H(T,0)  is  used  to  form  a minimum 
mean-square  error  (MMSE)  estimate  of  the  input  function  X^(T,  0). 

The  mean- square  error  is 


MSE  = E 


| / / IX  - [(X  + N ) X +N.  1 H 

I I I | uu  L uu  e w i J 


dT  d0}.  (28) 
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2-0  FOURIER  TRANSFORM 
OF  SPECTROGRAM 


2 - D FOURIER  TRANSFORM  OF 
SPECTROGRAM  CORRUPTED  BY 
INTERNAL  NOISE 


xuu(T,<»  = ESTIMATE  OF  SIGNAL  AMBIGUITY  FUNCTION 


Figure  1.  An  estimation  problem  that  is  encountered  when  a 
signal  or  its  autoambiguity  function  is  reconstructed  from 
a spectrogram. 
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In  Appendix  B,  a variational  approach  is  used  to  find  H(r,  0)  such 
that  MSE  is  minimized.  The  resulting  function  H (t,  0)  is 


Ho(T,  0) 


X 

au(T-  *> 

2> 

Xvv<-T’  •> 

2 

r 

L 

XUU(T'*> 

^ + P (T.0) 

1 e J 

+ Pi(T,  0) 

(29) 


where  P^T,  0)  = E { | Nfi(T,  0)  | 2} , P.(T,  0)  = E { | N.(t,  0)  | 2 } , and 

'|Xuu(T’0,|2  is  an  expected  value  of  Jx^fT,  0)  |2  that  is  based  upon 
accumulated  knowledge  of  the  signals  that  are  encountered  in 
practice. 

Substituting  Hq(T,  0)  from  (29)  back  into  (28),  we  obtain 
(see  Appendix  B) 


If  both  P^  and  P.  are  zero,  (29)  is  identical  to  (A  13),  e^^t,  f ) is 
obtained  by  deconvolution,  and  MSE  = 0.  If  P.  is  zero  but  P is  not, 
MSEq  in  (30)  is  independent  of  the  filter  impulse  response  v(t).  For 
no  internal  noise,  any  convenient  filter  function  can  be  used,  pro- 
vided that  H (T,  0)  is  bounded.  This  result  is  surprising,  since  it 
o 

implies  that  the  time  and  frequency  resolution  of  the  reconstructed 
signal  energy  density  function  e^(t,  f)  is  independent  of  the  band- 
width of  the  analyzing  filters!  If  there  is  no  internal  noise,  spectro- 
grams that  are  obtained  with  octave-wide  filters  yield  the  same 
mean-square  error  as  spectrograms  that  are  obtained  with  very 
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narrowband  filters,  even  for  narrowband  signals!  Without  frequency 

interpolation,  this  result  holds  only  if  ( 1 6 ) is  satisfied,  and  will 

be  large  for  narrowband  signals.  Although  wideband  filters  can  be 

used,  their  center  frequencies  must  be  closely  spaced,  and  the 

resulting  transfer  functions  will  overlap. 

The  existence  of  internal  noise  can  be  advantageous  from  the 

viewpoint  of  realizability.  If  P^(T,  0)  has  no  zeroes,  then  the  filter 

H (T,  0)  will  be  bounded  (i.  e.  , realizable)  even  if  lx  (-T,  0)  I ^ = 0 
o I w I 

for  many  values  of  (T,  0).  The  condition  P.(T,  0)  * 0 can  therefore 

result  in  a larger  set  of  admissible  filter  functions,  v(t),  such  that 

H (T,  0)  is  bounded, 
o 

Another  interesting  condition  occurs  for  relatively  large 
internal  noise  power,  i.  e.. 


2 2 

P.(r,0)  »|xvv(-r,  0)|  [|xuu(r,0)|  + Pe(r,0) 


(32) 


In  this  case. 


H (r,  0)  « X * 
o vv 


( -T , 0) 


(33) 


If  P.(r,  0)  « | xuu(r,  0)|2  ’,  (33)  describes  the  "sharpening"  opera- 
tion in  Property  11  of  Appendix  A and  (19)  - (25).  If  P..(r,  0)  is 
constant,  (33)  is  a "sharpening"  operation  in  cascade  with  a filter 


In  many  situations,  it  may  be  desirable  to  reconstruct  the 
noisy  input  data  as  accurately  as  possible.  Filtering  algorithms  can 
then  be  applied  to  the  reconstructed  process.  In  this  case,  P e(r,  0) 
is  by  definition  equal  to  zero,  since  the  desired  signal  is  the  data 
itself. 


In  summary,  we  can  obtain  an  optimum  (MMSE)  reconstruc- 
tion of  ^uu(r*  0)  by  using  the  generalized  deconvolution  filter 
H (r,  0)  in  (29).  In  order  that  0)  is  not  under-sampled,  the 

spectrogram  and  the  filter  function  should  be  sampled  at  4 max 

(T  , T ) samples/Hz  in  the  frequency  direction  and  4 max  (B  , B ) 
u v u v 

samples/sec  in  time.  In  the  absence  of  internal  noise,  the  mean 
square  error  of  the  ^uu(r*  estimate  is  independent  of  the  shape 
of  the  filter  transfer  function,  V(f).  In  the  presence  of  internal 
noise,  the  mean  square  error  of  the  reconstruction  process  depends 
upon  V(f).  If  a narrowband  signal  is  masked  by  narrowband  noise, 
(30)  predicts  a critical  band  effect,  as  shown  in  Appendix  E. 

2.2.7  Detection  of  Weak  Signals  in  Noise 
2.  2.  7.  A Spectrogram  Correlation 

The  spectrogram  contains  all  the  information  about  an  input 
signal,  except  for  a factor  with  unity  magnitude  and  unknown  phase. 

If  a signal  is  known  exactly,  or  if  some  of  its  characteristics  (e.g., 
its  bandwidth)  are  known,  it  should  be  possible  to  exploit  this 
knowledge  in  order  to  detect  the  signal  from  a noisy  spectrogram. 

The  design  and  performance  of  such  a detector  are  discussed  in 
this  section. 

For  detection  of  a known  signal  with  spectrogram  S^v(t,  f), 
we  can  compare  the  likelihood  ratio  (Van  Trees,  1968) 

A = p(ZlS  , Hj)  / p(ZlHQ)  (34) 

with  a threshold.  In  (34),  p(ZlS  H^)  is  the  probability  that  the 
observed  spectrogram  Z(t,  f)  will  occur,  given  that  the  known  signal. 
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as  well  as  noise,  is  present  (Hj  true),  and  p(Z|Hg)  is  the  proba- 
bility that  the  observed  spectrogram  will  occur,  given  that  only 
noise  is  present  (H^  true). 


It  often  happens  that  the  form  of  the  detector  is  dependent 
upon  signal  level,  and  there  is  no  simple  operation  that  can  be  applied 
to  the  data  for  all  signal  levels.  In  this  case,  one  can  resort  to  the 
use  of  a locally  optimum  detector,  which  is  designed  only  for  very 
small  signal-to-noise  ratio  (Capon,  1961;  Middleton,  1966). 

Although  such  a detector  may  be  suboptimum  for  larger  SNR,  the 
use  of  an  optimum  statistic  is  most  critical  for  small  SNR,  and 
suboptimum  behavior  for  larger  SNR  is  often  acceptable. 

A locally  optimum  detector  is  obtained  by  writing  A in  terms 
of  an  SNR  parameter  0.  The  detection  statistic  is  then 


8 In  A(0)/80 


8=0 


(35) 


If  the  derivative  of  in  A with  respect  to  0 is  greater  than  a thres- 
hold y when  0 equals  zero,  then  the  test  indicates  that  the  signal 
is  present.  Otherwise,  it  is  decided  that  the  signal  is  absent. 

It  is  shown  in  Appendix  C that  the  left-hand  side  of  (35) 
depends  upon  the  quantity 

H z..  s..  « 

i j 

where  z = Z(t.,  f.)  and  s.,  = S (t.,  f.)  are  samples  of  data  spectro 
lj  i ) lj  uv  1 j 

gram  and  of  the  noise-free  signal  spectrogram,  respectively. 


// 


Z(t,f)  Suv(t,f)  dtdf 


(36) 
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A locally  optimum  likelihood  ratio  test  indicates  that  the 
spectrogram  correlation  process  in  (36)  is  an  ideal  method  for 
detection  of  a signal  with  known  spectrogram  in  additive,  white, 
Gaussian  noise,  when  signal-to-noise  ratio  is  small.  Samples  of 
the  data  spectrogram  Z(t,f)  are  correlated  with  corresponding 
samples  of  the  noise-free  signal  spectrogram 

The  spectrogram  correlator  can  be  adapted  to  non-Gaussian 
noise  by  using  a nonlinearith  other  than  an  ideal  square  law  device 
at  the  filter  outputs.  This  modification  is  discussed  by  Poor  and 
Thomas  (1978). 

Equation  (36)  implies  that  spectrogram  correlation  is  ideal 
for  low  SNR,  when  the  receiver  is  constrained  to  use  the  spectro- 
gram as  data.  It  will  be  shown,  however,  that  the  performance  of 
a spectrogram  correlator  is  inferior  to  that  of  an  ideal  detector 
which  processes  the  data  itself,  when  the  signal  is  known  exactly. 

2.2.7.B  Matched  Filtering  of  Reconstructed  Data 

Pulse  compression,  correlation,  or  matched  filtering  is  an 
ideal  detection  method  that  is  used  in  many  radar,  sonar,  and 
communication  systems.  Matched  filtering  is  ideal  under  several 
criteria.  The  Bayes  criterion  seeks  a detector  that  minimizes  the 
expected  risk  which  is  incurred  when  costs  are  assigned  to  detection 
and  false  alarm  probabilities.  The  Neyman- Pear  son  criterion  seeks 
to  maximize  probability  of  detection  for  a given  upper  bound  on 
false  alarm  probability.  Both  the  Bayes  and  Neyman- Pearson 
criteria  lead  to  a likelihood  ratio  test,  and  a correlation  operation 
is  an  implementation  of  the  likelihood  ratio  test  for  detection  of  a 
known  signal  in  white,  Gaussian  noise.  The  signal-to-noise  ratio 
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(SNR)  criterion  seeks  a filter  that  maximizes  output  SNR  for  a 
known  signal  in  white  (not  necessarily  Gaussian)  noise.  The  matched 
filter  is  the  best  linear  filter  for  SNR  maximization.  Another  cri- 
terion is  not  concerned  with  detection  of  the  signal,  but  with  maxi- 
mum likelihood  estimation  of  its  epoch  or  time  of  arrival.  In 
white  Gaussian  noise,  the  correlator  is  the  required  epoch  esti- 
mator for  a known  signal. 

In  order  to  implement  a matched  filtering  operation,  it  is 
necessary  to  reconstruct  either  the  data,  its  energy  density  function, 
or  its  ambiguity  function.  Reconstruction  of  the  d^ta,  rather  than 
the  signal,  implies  that  P^(r,  ^)  s 0 in  (29)  and  (30).  The  filter  in 
(29)  is  then  used  to  obtain  an  estimate  of  the  ambiguity  function  of 


the  input  data,  X,  ,(r,  ^),  where 
dd 


d(t)  = u(t ) + n(t)  , 


(37) 


i.  e.,  the  data  is  the  signal  u(t)  plus  a sample  function  of  the  noise, 
n(t). 

To  implement  a correlation  operation.  Property  15  in  Appen- 


dix A can  be  applied.  Letting  u (t)  = u(t~  r ),  where  r is  a hypo- 

H H H 


thetical  delay,  we  have 


||xdd(r,^)Xu*HuH(r^)drd^  ^ | J d(t)  u*(t  - rH)  dt|2  (38) 


which  equals  |R  (r  )|  , the  signal  autocorrelation  function,  when 
1 u H 


n(t)  = 0.  A similar  result  can  be  obtained  from  Property  14  if 


e,„(t,  f)  is  obtained  from  X,  At,  $)  by  a two-dimensional  Fourier 
dd  dd 


transform. 


; ] 


1 
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It  would  appear  from  (38)  that  the  unknown  phase  constant 
A which  appears  in  the  reconstructed  data  (Property  9)  does  not 
affect  the  capability  of  the  receiver  to  perform  a pulse  compression 
operation.  To  obtain  further  insight  into  this  phenomenon,  we  shall 
consider  detection  of  the  signal  from  the  reconstructed  time  func- 

A A 

tion  d(t),  rather  than  from  X,  .(r,  0). 

dd 

A 

From  Property  9,  the  reconstructed  data  d(t)  exp  (jX)  can  be 

A 

obtained  from  the  estimated  ambiguity  function  X (t,  p),  where 

dd 

X ,,(T,  <t>)  is  constructed  as  in  Figure  1 with  P (r,  0)  = 0.  The 
dd  e 

A 

reconstructed  data  d(t)  exp  (jX)  can  then  be  passed  through  a 

matched  filter  with  impulse  response  u*(-t).  The  response  of  the 

filter  in  the  absence  of  no.se  is  R^(t)  exp  jX,  where  R^(t)  = X^u(t,  0) 

is  again  the  autocorrelation  function  of  the  signal.  The  envelope 

2 

detected  matched  filter  response  is  | R^(t)|  in  the  absence  of  noise. 
The  unknown,  constant  phase  parameter  X is  eliminated  by  enve - 
lope  detecting  the  matched  filter  response,  and  pulse  compression 
can  be  implemented  even  though  X is  unknown. 

2.  2.  7.  C Estimator-Correlator  Configuration  for 
Detection  of  a Random  Signal 

Matched  filtering  is  for  detection  of  a known  signal,  but  data 
reconstruction  from  the  spectrogram  can  be  used  to  detect  a random 
or  unknown  signal  as  well  as  a known  one.  To  detect  a random  sig- 
nal, one  can  sometimes  take  advantage  of  prior  information  about 
the  signal  and  noise  in  order  to  construct  an  estimate  of  the  signal. 
The  estimated  signal  is  then  correlated  with  the  original  data.  This 
estimator-correlator  configuration  was  first  derived  for  Gaussian 
signals  in  Gaussian  noise  (Price,  1956;  Kailath,  I960),  but  the 
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result  has  been  generalized  to  include  non-Gaussian  signals  in 
Gaussian  noise  (Kailath,  1969  and  1970;  Scharf  and  Nolte,  1977). 

In  contemporary  man-made  systems,  an  estimate  of  the 
signal  is  usually  obtained  by  using  information  about  the  power 
spectral  densities  of  the  signal  and  noise  processes  (Wiener  filter- 
ing) or  by  using  a model  of  the  signal  process  that  involves  a linear 
system  (state  variable  representation)  driven  by  white  noise  (Kalman 
filtering). 

Another  specification  of  prior  information  about  the  signal 
process  is  knowledge  of  1x^(7",  0)|  * the  expected  squared  envelope 
of  the  ambiguity  function  that  is  constructed  from  a windowed  version 
of  the  signal.  A similar  noise  characterization  uses 


Pe(r.*)  = Ne(r,*)|  , 


where  (26)  shows  that  P_(r,  0)  is  signal- dependent  as  well  as  noise- 
dependent.  Given  the  spectrogram  of  the  input  data,  jx  (r,  0)| 
and  P^fr,  0)  can  he  used  to  obtain  an  estimate  of  the  signal  ambiguity 
function  X^fr,  0)  by  using  Hq(t,  0)  in  (29),  and  a signal  estimate 
u(t)  exp  (jkj)  is  obtained  from  X^fr,  0)  via  Property  9.  An  esti- 

A 

mate  of  the  original  data  d(t)  exp  (jX^)  where  d(t)  is  given  by  (37), 

A A 

can  be  obtained  from  X,.(r,  0),  where  X ,.(r,  0)  is  constructed 

dd  dd 

from  the  spectrogram  by  letting  P^fr,  0)  = 0 in  (29).  Correlation  of 
the  estimated  signal  u(t)  exp  (jX^)  with  the  reconstructed  data 

A 

d(t)  exp  (jX^)  then  gives  a statistic  that  is  often  suitable  for  detection 
of  a random  signal.  Envelope  detection  of  the  correlator  output 
eliminates  the  effect  of  the  unknown,  constant  phase  parameters  X^ 
and  X^  that  are  introduced  by  the  reconstruction  processes. 
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An  alternate  form  of  the  estimator  - cor  relator  is  obtained 

by  integrating  the  product  X (T,  0)  X (r,  0),  as  in  Property  15. 

tty _ dd  ^ 

In  either  case,  knowledge  of  |^uu(r*  0)|  » rather  than  |U(f)|  or  a 

state  variable  model,  is  used  to  form  an  estimate  of  the  unknown  or 

random  signal  process  for  use  in  an  estimator -cor relator  detector. 

When  does  a receiver  know  the  expected  ambiguity  function 

of  a random  signal?  The  answer  lies  in  an  input-output  relation  for 

time-varying  random  channels  (Bello,  1963;  Daly,  1970;  Venetsano- 

poulos,  1978).  The  expected  ambiguity  function  of  the  output  of  a 

time-varying  random  channel  with  transfer  function  C(t,  f)  is  given  by 

the  product  of  the  expected  ambiguity  function  of  the  input  signal  and 

the  channel  time-frequency  autocorrelation  function  R (a,  (3),  where 

C 

Rc(a,0)  = E |C*(t,f)  C(t+a,  f + 0)}  . 

2.2.7.D  Relative  Performance  of  a Matched  Filter  and 
a Spectrogram  Correlator  when  a Signal  Is 
Known  Exactly 

Appendix  C indicates  that  spectrogram  correlation  is  a locally 
optimum  detection  process  if  only  the  spectrogram  of  the  data  is  avail- 
able. Such  a process  is  not  ideal,  however,  if  the  data  itself,  or  the 
data  autoambiguity  function,  is  available.  When  one  has  access  to  the 
data  or  to  its  autoambiguity  function,  a matched  filter,  or  the  equiva- 
lent process  in  (38),  is  the  best  detector  for  a known  signal,  under  a 
variety  of  criteria.  For  low  internal  noise,  the  data  or  its  autoambiguity 
function  can  be  reconstructed  from  the  spectrogram  by  two-dimensional 
deconvolution.  Given  a choice  between  non-ideal  spectrogram  corre- 
lation and  a more  complicated  ideal  operation,  it  is  important  to 
know  the  improvement  in  performance  that  can  be  expected  if  one 
implements  the  ideal  detector. 
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In  order  to  easily  determine  the  performance  of  the  spectro- 
gram correlator,  the  following  three  assumptions  will  be  made: 


1. 


3. 


It  will  be  assumed  that  the  reference  spectrogram  S(t,  f)  and 
the  data  spectrogram  Z(t,f)  are  represented  by  samples 

s . = S(t  ,f.)  and  z..  = Z(t.,f.)  that  are  (4B  ) * seconds  apart 

1 J - 11J  1 J v 

in  time  and  (4T  ) Hz  apart  in  frequency. 

It  will  be  assumed  that  s • must  equal  either  zero  or  one,  i.e.  , 
that  a one-bit  quantization  of  the  reference  spectrogram  is 
used.  This  assumption  simplifies  the  calculations  without  lead- 
ing to  unrealistic  results  (Hagan  and  Farley,  1973).  It  will 
also  be  assumed  that  there  are  K independent  samples,  i.  e.  , 
that  s_  = 1 over  a total  of  K different  values  of  i and  j. 


It  will  be  assumed  that  the  signal  is  power  limited,  but  that 
it  is  designed  for  maximur 
power  P and  duration  T _.  Since 


it  is  designed  for  maximum  energy  for  a given  peak 


u 


u 


fT  u 

'u  = / |u(t)|  dt  * m 

/ 0 * t 


max 
< T 


|u(t)f 


u 


maximum  energy  is  obtained  for  a rectangular  envelope 

7 

I u (t ) | , such  that  |u(t)|  = P for  t in  I 0,  T ] and 

u u 

E = P T . 
u u u 


(39) 


The  sampling  rates  in  assumption  1 are  such  that  the  spec- 
trogram can  be  written  as  a sum  of  orthonormal  sine  functions  with 
coefficients  that  correspond  to  the  sample  values.  When  white, 
Gaussian  noise  is  present  at  the  input  to  the  system,  the  samples 
z_  are  statistically  independent  (Slepian,  1954). 
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The  second  assumption  implies  that  the  correlation  of  a mea- 
sured spectrogram  Z|t,  f)  with  the  reference  spectrogram  S(t,f) 
can  be  written  as  the  sum 


z. . s . . 

»)  ‘.l 


(i.j)’ 


where  (i.i)'  is  the  set  of  i,  i such  that  s . = 1. 

For  Gaussian  input  noise,  the  energy  detected  filter  outputs 

z..  are  distributed  as  in  (C7)  or  (C8),  depending  upon  whether  the 

signal  is  present  (H^  true)  or  absent  (H^  true).  Since  the  samples 

z . are  statistically  independent,  the  sum  in  (40)  has  chi-square 

0 

distribution  under  and  noncentral  chi- square  distribution  under 

Hj.  The  performance  of  the  resulting  detector  for  a random  signal 

with  bandwidth  B < B and  duration  T * T has  been  analyzed  by 
u v u v 

Urkowitz  (1967).  For  this  case,  the  sum  over  (i,jV  in  (40)  becomes 
a sum  over  i,  with  a fixed  value  of  \ corresponding  to  the  center 
frequency  f.  of  the  appropriate  filter. 

The  probability  distributions  that  are  obtained  from  (40)  have 
K degrees  of  freedom,  where  K is  the  number  of  independent  spec- 
trogram samples  that  are  added  together  in  (40).  For  a spectrogram 
that  is  K samples  in  duration  and  one  sample  wide  in  frequency. 


K = (4B  ) T . 

v u 


In  the  paper  by  Urkowitz,  K = 2B  T because  the  baseband  filter 

v u 

V (f ) is  assumed  to  have  two-sided  bandwidth  B , rather  than  2B  . 

v v 

Although  K and  are  often  treated  as  independent  variables, 
they  are  linearly  related  when  the  signal  has  a rectangular  envelope 
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with  specified  power,  as  in  assumption  3.  Solving  (41)  for  T and 
substituting  the  result  into  (39),  we  obtain 


E = KP  / (4B  ) . 
u u V 


Following  Urkowitz  (1967),  we  define  the  signal-to-noise  ratio  as 


SNR  = E / (N  /2) 
u o 


= KP  / (2B  N ) 

U VO 


where  Nq/2  is  the  two-sided  noise  power  spectral  density.  For 
convenience,  we  define 

P s 2B  N /, 

u v o ' 


SNR  = K . (45) 

For  the  constant- envelope  signal,  the  one-bit  quantization  for  s 

»j 

(assumption  2)  is  no  longer  an  approximation,  but  is  an  exact  repre- 
sentation of  the  noise-free  spectrogram. 

The  above  formulation  allows  easy  comparison  of  the 
performance  of  a matched  filter  and  spectrogram  correlator,  since 
the  spectrogram  correlator  has  been  made  identical  to  an  energy 
detector.  Figure  2 illustrates  probability  of  detection  (P^)  for 
the  matched  filter  (dotted  lines)  and  energy  detector  (solid  lines)  as 
a function  of  (SNR)*,  for  various  values  of  false  alarm  probability 
(Pp).  The  matched  filter  performance  was  obtained  from  Van 
Trees  (1968),  and  the  energy  detector  performance  was  obtained 
from  the  ROC  curves  in  the  paper  by  Urkowitz  (1967). 
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Another  comparative  measure  is  obtained  by  plotting  the 


factor  a such  that 


“<SNRW  ■ ,snb,ed 


as  a function  of  (SNR)  . where  (SNR)  and  (SNR)  are  the 

MF  MF  ED 

values  of  signal-to-noise  ratio  that  give  the  same  performance, 

i.e.,  the  same  values  of  and  P^,  for  the  matched  filter  (MF) 

Dr 

and  energy  detector  (ED).  Since  the  energy  detector  is  suboptimal, 

we  expect  that  a * 1,  i.e.,  a larger  SNR  is  required  in  order  that 

an  energy  detector  can  perform  as  well  as  a matched  filter. 

Figure  3 shows  a versus  (SNR)  . The  curve  is  a composite  of 

MF  -3  -2  -1 

points  obtained  from  Figure  2 for  P_  =10  ,10  , and  10 

F 

Figure  3 indicates  that,  for  more  than  four  observations, 
the  performance  of  the  energy  detector  can  be  made  equal  to  that 
of  the  matched  filter,  if  approximately  three  times  as  many  observa- 
tions are  made.  If  the  results  are  not  very  dependent  upon  (44),  it 
would  appear  that  the  signal  power  must  be  increased  by  5 dB  in 
order  to  make  the  spectrogram  correlator  equivalent  to  a matched 
filter.  The  two-dimensional  deconvolution  process  in  (A  13),  followed 
by  a matched  filtering  operation,  results  in  an  effective  increase  of 
5 dB  in  signal  energy,  if  internal  noise  is  negligible. 


2.  2.  8 Array  Processing  for  Direction  Estimation 

Suppose  that  two  or  more  spatially  separated  sensors  are 
used  for  reception  of  an  unknown  acoustic  signal,  and  that  an  esti- 
, mate  of  the  direction  of  the  signal  source  is  desired.  Hahn  and 
Tretter  (1973)  and  Hahn  (1975)  have  shown  that  c ross- correlation 


Figure  3.  For  a given  detection  perfomance  (specified 
values  of  P and  P ) , the  required  signal-to-noise  ration 
for  a matched  filter  is  (SNR)Mp.  For  a signal  with 
constant  power,  the  same  performance  can  be  obtained  with 
an  energy  detector,  provided  (SNR)  is  multiplied  by  a 
factor  a.  Solid  line:  a vs.  (SNR)  „ for  P = 2B  N . 


of  the  outputs  of  pairs  of  sensors  loads  to  an  efficient  estimate  of 

relative  delay.  For  a pair  of  sensors  with  outputs  d (t ) and  d (t), 

2 

we  want  to  form 


R 


dld2 


<AV 


/ 


d (t > d )t 


ArH) 


dt 


(47) 


where  is  any  hypothesized  relative  delay  between  the  two  sensor 

responses.  The  cross-correlator  responses  are  then  linearly  com- 
bined to  form  estimates  of  the  relative  delays  between  the  responses 
of  the  various  sensors  and  the  response  of  the  first  sensor.  Unspeci- 
fied limits  of  integration  correspond  to  (-00,  «>). 

If  the  signal  u(t)  is  known  except  for  the  direction  of  its 
source,  one  can  estimate  direction  by  means  of  a correlation  pro- 
cessor. Delay  variables  t^(0^1  are  generated  from  a direction 

hypothesis  and  a knowledge  of  the  relative  spatial  positions  of 
the  sensors.  The  data  d^t)  from  the  nth  sensor  is  correlated  with 
U ' * "Tn  anc'  result  is  added  to  the  correlator  outputs  of 

the  other  sensors.  The  resulting  correlation  function  is  (Altes, 

19781 

Rd.'V  = [‘-  wl dt  • (''81 


The  operations  in  (47)  and  (48)  yield  efficient  direction  esti- 
mates in  white  Gaussian  noise  that  is  independent  from  sensor  to 
sensor.  Both  of  the  operations  involve  correlation  functions,  not 
the  envelopes  of  these  functions.  The  inadequacy  of  using  envelopes 
becomes  obvious  if  one  considers  a complex  sinusoid 


d^ft)  = a(t  + rn)  exp|  j2.-rfo(  t + r^)  j 

where  aft)  is  a smooth,  slowly  varying  envelope.  An  accurate 
delay  estimate  is  available  only  if  the  phase  of  the  correlation 
process  is  preserved.  In  (47),  for  example, 

aft)  a* ft  + - A r)  dt 

(ArH  - Ar) 

where  A T = T - r^.  The  ohase  of  the  autocorrelation  function  is 

a sensitive  measure  of  Ar  - A r,  but  the  phase  is  time  invariant  and 

H 

is  eliminated  when  R ,,  (Ar  ) is  envelope  detected. 

dld2  H 

Operations  that  involve  the  envelopes  of  correlation  func- 
tions, as  in  (A19),  (A21),  and  (A23),  cannot  be  used  for  accurate 
narrowband  direction  estimation.  In  particular,  cross-correlation 
of  spectrograms,  as  in  Property  12  of  Appendix  A,  is  not  suitable 
for  array  processing  of  narrowband  signals.  Spectrogram  cross- 
correlation cannot  be  substituted  for  cross-correlation  of  the  signals 
themselves  because  narrowband  delay  information  is  destroyed 
when  phase  is  eliminated. 

Since  an  unknown  constant  X is  added  to  the  phase  of  a signal 
that  is  reconstructed  from  a spectrogram,  it  would  appear  that  such 
reconstructed  signals  cannot  be  used  for  direction  estimation.  This 
observation  is  considered  in  more  detail  in  Appendix  D. 

A sufficient  amount  of  phase  information  can  be  obtained  if 
phase,  as  well  as  amplitude,  can  be  observed  at  the  output  of  only 
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one  filter  V(f-F),  provided  there  is  signal  energy  at  f..  This 
statement  follows  from  the  fact  that  A is  invariant  with  frequency. 

If  A can  be  measured  for  one  frequency  {.,  it  is  known  for  the  whole 
spectrum.  This  idea  is  discussed  in  more  detail  in  Appendix  D. 

If  the  constant  A that  is  added  to  the  phase  of  a reconstructed  signal 
is  known,  then  an  input  signal  can  be  completely  determined  from 
its  spectrogram.  A phase  measurement  from  a single  filter  then 
provides  sufficient  information  such  that  reconstructed  narrowband 
signals  from  spatially  separated  sensors  can  be  used  for  direction 
estimation. 

2.  2.  9 Application  to  Theories  of  Hearing  and  Animal 
Echolocation 

2.2.9.  A Overlapping  Filters,  Signal  Duration,  and  Just- 
Noticeable  Frequency  Difference 

Sampling  rates  for  spectrograms  were  discussed  in  Sec- 
tion 2.2.4.  Assuming  that  frequency  interpolation  is  not  used,  (16) 
to  (18)  indicate  that,  for  two-dimensional  deconvolution,  there  must 
be  significant  overlap  of  the  transfer  functions  V(f-f. ).  The  maxi- 
mum signal  duration  is  determined  by  the  spacing  Af  between 
center  frequencies  of  adjacent  filters.  For  input  signals  that  are 
longer  than  (2Af^)  seconds,  the  spectrogram  will  be  undersampled 
insofar  as  deconvolution  is  concerned,  and  the  input  signal  cannot 
be  reconstructed.  In  this  case,  correlation  processing  of  recon- 
structed signals  cannot  be  used,  but  spectrogram  correlation  can 
be  used  instead. 
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The  theory  of  signal  reconstruction  from  spectrograms 
therefore  leads  to  the  following  predictions: 

1.  There  will  be  significant  overlap  between  filters  that  are 
used  to  construct  the  spectrogram. 

2.  There  will  be  a maximum  signal  duration  T = (2Af, ) 

max  1 

beyond  which  matched  filtering  of  reconstructed  signals 

cannot  be  used.  Signals  that  are  longer  than  T can  be 

max 

detected  with  spectrogram  correlation,  which  is  not  as  effi- 
cient as  matched  filtering. 

3.  The  quantity  Af^  is  the  difference  between  center  frequencies 
of  adjacent  filters,  and  it  corresponds  to  the  sample  spacing 
for  U(f),  the  Fourier  transform  of  the  reconstructed  signal. 
The  just  noticeable  frequency  deviation  of  a frequency  modu- 
lated (FM)  signal  should  therefore  correspond  to  Af ^ , where 

Af,  = (2T  r1. 

1 max 


Assuming  that  a spectrogram  is  constructed  from  the 
responses  of  hair  cells  along  the  basilar  membrane  and/or  periph- 
eral tuned  neurons,  there  is  indeed  significant  overlap  between  the 
filters  that  are  used  to  construct  the  spectrogram  (Evans,  1977; 
Pfeiffer  and  Kim,  1975). 

Detectability  versus  duration-of-tone  data  for  human  lis- 
teners shows  a change  in  slope  between  107  msec  and  277  msec 
duration  (Green,  Birdsall,  and  Tanner,  1957)  for  detection  of  a 
1 kHz  tone.  This  change  could  indicate  a transition  from  matched 
filtering  to  spectrogram  correlation.  The  Af^  values  correspond- 
ing to  T = 107  to  277  msec  are  Af  = 1.8  Hz  to  4.  7 Hz.  For 
max  1 

carrier  frequencies  below  2 kHz,  the  just  noticeable  frequency 


deviation  of  an  FM  signal  is  between  1 Hz  and  4 Hz  (Shower  and 
Diddulph,  1931;  Fletcher,  1953). 


Relation  Between  Filter  Shape  and  Frequency 
Discrimination  Capability 


A discrepancy  exists  between  the  relatively  wide  bandwidths 
of  neural  tuning  curves  (Evans,  1977;  Pfeiffer  and  Kim,  1975)  or 
critical  bands  (Zwicker,  1961),  and  the  ability  of  a listener  to  per- 
form fine  frequency  discriminations  (Shower  and  Biddulph,  1931; 
Fletcher,  1 953;  Schafer,  ct  al.  , 1950).  This  discrepancy  is  often 
explained  by  remarking  that  the  sharp  high-frequency  cutoff  of 
neural  tuning  curves  allows  for  accurate  frequency  discrimination, 
even  though  the  tuning  curves  themselves  have  wide  bandwidths. 
Although  this  explanation  is  certainly  plausible,  the  theory  of  signal 
reconstruction  from  spectrograms  provides  an  alternative  view- 
point. For  zero  internal  noise,  (30)  indicates  that  the  accuracy  of 
a reconstructed  signal  (as  measured  by  minimum  mean-square 
error)  is  independent  of  the  shape  or  bandwidth  of  the  filters 
V(f-f.)  that  are  used  to  construct  the  spectrogram.  If  internal 
noise  can  be  neglected,  it  follows  that  frequency  resolution  does 
not  depend  upon  narrow  filter  bandwidths  or  sharp,  high  frequency 
cutoffs . 


It  can  be  conjectured  that  the  sharp,  high  frequency  cutoff 
that  is  often  observed  in  neural  tuning  curves  is  such  as  to  minimize 
the  mean-square  error  in  (30)  when  internal  noise  cannot  be  neglected 
Rapid  cutoff  may  also  be  useful  for  low  pass  filtering  of  band  limited 
signals.  Conversely,  one  should  be  ready  to  accept  measurements 
that  do  not  reveal  a rapid,  high  frequency  cutoff,  e.  g.  , the  revcor 
functions  of  de  Boer  and  de  Jongh  (1978). 


2.2.9.C  Critical  Bands 


For  masking  of  narrowband  signals  by  narrowband  noise, 
there  is  a critical  ratio  of  signal  center  frequency  to  noise  center 
frequency  such  that  the  masker  has  little  or  no  effect  (Schafer,  et 
al.  , 1950).  If  this  ratio  is  approximately  invariant  with  signal 
frequency,  one  can  postulate  the  existence  of  filters  with  critical 
bandwidths  that  are  proportional  to  frequency  (Zwicker,  1961).  In 
Appendix  E,  it  is  heur istically  argued  that  a critical  band  effect 
will  occur  when  a signal  is  estimated  from  the  spectrogram  of 
noise-corrupted  data,  and  it  is  shown  that  critical  bands  can  also 
result  from  a spectrogram  correlation  operation. 

2.  2.  9.  D Monaural  Phase  Deafness 

By  using  Properties  8 and  9 in  Appendix  A,  it  has  been 
shown  that  a complex  analytic  signal  u(t)  exp  ( j A)  can  be  recon- 
structed from  the  spectrogram  S (t,  f),  where  X is  an  unknown 
constant.  If  the  spectrogram  model  applies  to  human  audition,  it 
follows  that  people  should  be  monaurally  insensitive  to  an  arbitrary 

phase  shift  X.  For  example,  u (t),  u (t),  - u (t),  and  -u  (t)  should 

r r r r 

be  indistinguishable,  where  u (t)  is  the  Hilbert  transform  of  u (t). 

r r 

This  predicted  phase  insensitivity  is  more  restrictive  than 
insensitivity  to  the  frequency  dependent  phase  transformation  that' 
has  been  suggested  by  Schroeder  (1975).  Schroeder,  however,  has 
pointed  out  that  the  experiments  of  Craig  and  Jeffress  (1962)  and 
Terhardt  and  Fasti  (1971)  indicate  a sensitivity  to  his  more  general 
frequency  dependent  phase  transformation.  The  experiments,  how- 
ever, do  not  indicate  any  sensitivity  to  the  frequency  invariant  phase 
shift  that  transforms  U(f)  into  U(f)  exp  (jX). 
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2.2.9.E  Envelope  Variations  and  Phase- Locked  Neural 
Firing  Patterns 


Hartmann  (1978)  has  recently  found  that  human  listeners  are 
sensitive  to  the  form  of  the  envelope  of  a sinusoid,  u(t).  One  of 
Hartmann's  explanations  for  this  phenomenon  is  that  the  time- 
window  spectrogram 


S 


T 

u(t),  w(-t) 


-t)  u(t)  exp  (-jZrrf^t)  dt 


2 


is  different  for  different  envelopes,  and  that  the  human  auditory 
system  may  transform  input  signals  into  such  a function.  Eq.  (A25) 
shows  that 


S , n<V  = 

u(t),  w(-t)  1 1 


s 

UV  1 1 


if 

w(t)  = v(t), 

where  S (t  , f ) is  the  frequency- window  spectrogram  in  (8). 
uv  1 1 

Hartmann's  results  can  thus  be  explained  in  terms  of  a frequency- 
window  spectrogram  model  of  audition. 

The  equality  w(t)  = v(t)  also  implies  that  one  can  find  a more 
realistic  weighting  function  model  than  an  exponential,  which  is  often 
employed  for  generation  of  time-window  spectrograms  (Hartmann, 
1978;  Fletcher,  1953;  Sayers  and  Cherry,  1957).  For  example,  one 
can  use  a baseband  version  of  a reverse -correlation  impulse  response 
(de  Boer  and  de  Jongh,  1978). 

Siebert  (1970)  has  applied  Cramfer-Rao  bounds  to  rate- 
modulated  Poisson  processes,  in  an  attempt  to  deduce  the 


j 

I 


| 

) 


- 
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mechanisms  for  auditory  frequency  disc rimination.  He  has  shown 
that  utilization  of  the  timing  information  from  phase-locked  neural 
firing  patterns  should  improve  frequency  discrimination  capability. 
Anderson,  et  al.  (1971)  have  discovered  such  phase-locked  neural 
responses  for  frequencies  up  to  4 to  6 kHz.  Despite  these  observa- 
tions, Siebert  has  concluded  that  phase-locked  neural  firing  patterns 
are  not  used  for  auditory  frequency  discrimination.  Monaural  fre- 
quency discrimination  is  apparently  based  upon  the  envelopes  of 
the  neural  filters'  responses,  or  upon  analysis  of  (i)  the  spectrogram, 
(ii)  a reconstructed  signal  ambiguity  function,  or  (iii)  the  recon- 
structed signal  itself. 

2.2.9.F  Energy  Detection,  Spectrogram  Correlation, 
or  Matched  Filtering? 

For  signals  that  are  longer  than  (Af^)  where  Afj  is  the 
difference  between  center  frequencies  of  adjacent  filters,  the  sample 
density  that  is  required  for  signal  reconstruction  exceeds  the  sample 
density  in  the  frequency  direction  that  is  used  to  characterize  the 
spectrogram.  In  this  case,  the  input  signal  cannot  be  reconstructed 
from  the  spectrogram  unless  the  data  is  time-gated,  and  spectro- 
gram correlation  becomes  an  attractive  alternative.  It  has  been 
demonstrated,  however,  that  spectrogram  correlation  is  sometimes 
the  same  thing  as  energy  detection.  The  two  operations  are  identi- 
cal when  all  nonzero  samples  of  the  reference  spectrogram  have 
the  same  magnitude  and  when  signal  duration  is  prespecified. 

For  signal  durations  that  are  longer  than  ~200  msec,  one 
must  resort  to  spectrogram  correlation.  For  shorter  durations, 
however,  matched  filtering  can  theoretically  be  performed  on  a 
reconstructed  version  of  the  signal.  Does  the  data  support  such  a 
matched  filtering  hypothesis? 
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Gated  sinusoids  have  often  been  used  as  stimuli  for  monaural 
detection  experiments.  A matched  filter  for  a short  duration  sinusoid 
has  wider  bandwidth  than  a matched  filter  for  a sinusoid  with  longer 
duration.  The  signal  reconstruction/matched  filter  hypothesis  there- 
fore predicts  that  a listener’s  performance  should  be  similar  to  that 
of  a filter  followed  by  a threshold  device,  and  that  the  corresponding 
filter  should  have  broader  bandwidth  as  signal  duration  is  decreased. 
In  fact,  several  researchers  have  interpreted  their  masking  data  in 
this  way,  i.  e.  , they  have  found  that  a filtering  model  is  relevant, 
provided  that  the  filter  bandwidth  broadens  as  the  duration  of  a 
sinusoidal  signal  becomes  smaller,  for  durations  less  than  100  msec 
(Srinivasan,  1971). 

Although  effective  bandwidths  increase  as  signal  duration 
becomes  smaller,  the  bandwidths  are  dependent  upon  the  frequency 
of  a tone  as  well  as  upon  its  duration,  and  the  bandwidths  are  gener- 
ally larger  than  those  of  a matched  filter.  For  an  11-msec  tone, 
Srinivasan  (1971)  has  reported  effective  bandw’idths  that  are  roughly 
165  Hz  wide  for  a tone  frequency  of  880  Hz  and  180  Hz  wide  at 
1110  Hz,  for  a rectangular  filter  transfer  function  model.  The 
matched  filter  for  the  tone  has  a bandwidth  of  roughly  90  Hz.  A 
90-Hz  bandwidth  i_s  observed  for  a 23-msec  tone,  which  should 
correspond  to  a 39-Hz  bandwidth  under  a matched  filter  hypothesis. 
The  observed  bandw’idths  are  therefore  approximately  twice  the 
bandwidth  of  an  ideal  matched  filter. 

Curves  showing  probability  of  a correct  response  P(C) 
versus  SNR  can  be  analytically  obtained  for  a two-alternative, 
forced- choice  (2AFC)  experiment,  as  discussed  in  Appendix  F. 

The  curve  for  a matched  filter  always  has  steeper  slope  than  the 
energy  detector  curve.  From  Figure  8-3  of  Green  and  Swets 


(1966),  one  can  see  that  a best  fit  to  the  experimental  P(C)  vs.  SNR 
data  for  detection  of  a gated  sine  wave  gives  a curve  with  a slightly 
steeper  slope  than  that  predicted  by  an  energy  detector.  This 
steeper  slope  may  be  indicative  of  an  imperfect  matched  filter 
(e.  g.  , one  with  too  wide  a bandwidth)  or  of  an  estimator-correlator 
configuration. 


The  detection  performance  of  human  observers  for  a sum  of 
gated  sine  waves  with  different  frequencies  has  been  investigated 
by  Green  (1958)  and  by  Green,  McKey,  and  Licklider  (1959).  Per- 
formance is  in  reasonably  close  agreement  with  a detector  that 
sums  the  squared  energies  of  the  M different  sinusoids,  i.  e.  , a 

M 

detector  that  computes  E-2-  A matched  filter  would  linearly 

j=l  J 

combine  the  energies  of  the  different  signal  components,  i.  e.  , its 


M 

performance  would  depend  upon  ^ E.  . Multiple  harmonics  are 

j = l J 

therefore  apparently  not  detected  with  a matched  filter  mechanism 
in  humans. 

A spectrogram  correlator  computes  a weighted  sum  of  the 
filter  response  envelopes  z_,  where  the  weights  are  proportional 
to  the  expected  values  of  these  responses  in  the  absence  of  noise, 
s_  (see  Appendix  C).  When  the  sinusoidal  signal  component  that 
excites  the  filter  has  a rectangular  envelope,  then  s = s for 

„ . ij  j 

all  time  samples  i=l,  2,  ....  N,  and 


j-i 

\ i 


» 
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When  multiple  components  with  different  energy  levels  are  present, 
i.  e.,  when  s.  is  different  for  different  j-values,  then  a spectrogram 
correlator  is  no  longer  identical  to  an  energy  detector,  as  was  the 
case  for  a single  sinusoid  with  rectangular  envelope  and  specified 
duration. 

N 


Since  s. 

J 

tional  to  the  ene 
we  have 


and  the  expected  value  of  z - are  both  propor- 


;th 


i = l 


rgy  E.  of  the  j sinusoidal  signal  component. 


M N 


EE 

j=i  i=i 


s. . 


z. . 
1J 


A spectrogram  correlator  computes  a weighted  sum  of  envelope 
detected  filter  responses,  and  the  expected  value  of  this  weighted 
sum  is  proportional  to  the  sum  of  the  squared  energies  in  the 
sinusoidal  signal  components.  This  behavior  is  similar  to  human 
detection  of  a superposition  of  gated  sine  waves  (Green,  1958). 

An  estimator- correlator  that  uses  the  spectrogram  as  data 
functions  like  a spectrogram  correlator,  except  that  it  has  no  prior 
knowledge  of  the  noise-free  filter  response  envelopes  s The 
device  forms 


where  s is  an  estimate  of  s.  that  is  obtained  from  the  data  spectro- 
J J 

gram,  Z(t.,  f.).  This  estimate  is  based  upon  the  conditional  mean  of 
the  data  spectrogram  (the  mean  of  the  probability  distribution  that 


46 


portional  to  E.,  a spectrogram  estimator-correlator  is  again 
expected  to  behave  like  a detector  that  computes  the  sum  of  the 
squared  energies  of  the  sinusoidal  signal  components. 


2.2.9.G  Echolocation 

Bats  and  dolphins  echolocate  with  ultrasonic  signals  (Griffin, 
1958;  Evans,  1973;  Airapetyants  and  Konstantinov,  1970).  Bats, 
in  particular,  use  signals  that  are  suggestive  of  a pulse  compression 
or  matched  filtering  capability  (Cahlander,  1967;  Kroszczynski, 

1969;  Altes  and  Titlebaum,  1970,  1975).  Echolocation  performance 
of  some  bats  is  also  indicative  of  such  a capability  (Simmons,  1973). 

Ultrasonic  signals  are  unlikely  to  possess  sufficient  energy 
below  6 kHz  such  that  the  unknown  phase,  X,  in  the  reconst ructed 
waveform  can  be  estimated,  when  echoes  are  obtained  from  their 
spectrograms.  It  was  shown  in  Section  VII  B that  pulse  compression 
can  nevertheless  occur  without  knowledge  of  X,  and  X is  eliminated 
by  envelope  detecting  the  matched  filter  response. 

The  possible  existence  of  an  ideal  detector  in  animal  echo- 
location  can  therefore  be  theoretically  justified  by  a signal  recon- 
struction argument,  even  if  echoes  are  initially  transformed  into 
spectrograms  by  the  peripheral  auditory  system.  It  was  shown  in 
Section  2.  2.  7.  D that  if  a spectrogram  correlation  process  is  used 
instead  of  reconstruction/matched  filtering,  the  penalty  is  an 
approximate  5 dB  loss  in  signal-to-noise  ratio. 

Ideal  detection  is  especially  useful  for  sonar  targets  that 
are  far  away  from  the  animal.  Bats  tend  to  use  comparatively  long- 
duration  "cruising  pulses"  to  detect  such  targets,  i.  e.  , the  signal 
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energy  is  made  relatively  large  even  though  peak  power  is  con- 
strained. Figure  2 shows  that  this  strategy  improves  the  detection  per- 
formance of  both  matched  filter  and  energy  detector  (or  spectrogram 
correlator).  If  only  spreading  loss  is  considered,  a 5 dB  improve- 
ment in  SNR  increases  the  range  of  a sonar  by  one  third,  i.  e.  , 

10  log  j^(R  i /R  2)4]  = 5dB  when  Rj  = 1.33  R^ 

In  the  special  case  of  animal  eeholocation,  a phase  change 
that  is  introduced  by  a target  can  affect  the  energy  spectral  density 
of  the  signal-echo  pair,  when  both  transmitted  and  received  pulses 
are  processed  as  one  waveform.  For  a target  that  is,  say,  15  m 
from  a dolphin,  the  transmitted  signal  and  the  echo  can  be  processed 
as  a single  waveform  if  the  integration  time  of  each  spectrogram  filter 
is  at  least  20  msec,  which  seems  very  reasonable.  Johnson  and 
Titlebaum  (1976)  have  used  this  concept  to  formulate  the  hypothesis 
that  range  measurement  in  animal  eeholocation  may  be  associated 
with  estimation  of  time  separation  pitch.  The  theoretical  phase 
sensitivity  of  the  process  is  demonstrated  as  follow's. 

Let  u(t)  be  the  transmitted  pulse  and  let  the  echo  be 
ou(t-  t)  exp  (jX^),  where  a is  an  attenuation  factor,  t is  delay, 
and  is  a phase  shift  that  is  introduced  by  the  reflection  process. 

The  magnitude-squared  Fourier  transform  of  [u(t)  +ou(t-  t)  exp(jX  )] 

2 2 ^ 
is  | U (f ) | [l+o  +2acos(2iTfT-\j)],  where  U (f ) is  the  F our  ier 

transform  of  u(t).  The  phase  factor  X^  thus  influences  the  shape  of 

the  energy  spectral  density  of  the  signal-echo  pair.  The  energy 

spectral  density  can  be  estimated  from  the  spectrogram  by  using 

property  5 in  Appendix  A. 

The  phase  parameter  X^  should  be  especially  useful  for 
underwater  eeholocation,  since  it  can  be  used  to  tell  the  difference 
between  a target  with  large  acoustic  impedance  (e.g.,  a rock)  and 
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one  with  small  impedance  (e.g.  , an  air  bladder).  Phase  is  also  a 
measure  of  radial  velocity  if  a linear  period  modulated  signal  (a 
typical  FM  bat  pulse)  is  transmitted  (Altes  and  Skinner,  1977). 
Spectrogram  processing  does  not  necessarily  eliminate  information 
about  the  relative  phase  between  two  pulses,  and  this  information 
may  be  useful  for  echolocation. 

2.2.9.H  Binaural  Processing  for  Direction  Estimation 

In  Section  2.  2.  8,  it  was  shown  that  direction  estimation  of 
narrowband  signals  is  theoretically  dependent  upon  a measurement 
of  relative  phase  between  sensors,  i.  e.  , between  the  two  cars. 

Although  this  phase  information  is  destroyed  in  the  formation  of  a 
separate  spectrogram  at  each  ear,  it  can  be  regained  if  the  response 
of  one  or  more  filters  (or  tuned  neurons)  can  be  observed  before 
envelope  detection  takes  place  (Appendix  D). 

The  localization  experiments  of  Sayers  and  Cherry  (1957)  indi- 
cate that  binaural  direction  estimation  in  humans  can  be  accomplished 
with  sine  waves  that  have  equal  amplitude  at  both  ears.  Since  sensi- 
tivity to  a phase  difference  between  the  ears  is  required  for  such  an 
estimate,  binaural  processing  in  humans  must  utilize  phase-locked 
responses  such  as  those  measured  by  Anderson,  et  al.  (1971). 

Phase-locked  responses  occur  only  at  low  frequencies  (less 
than  4 to  6 kHz).  It  is  shown  in  Appendix  D,  however,  that  a 
phase  measurement  at  a single,  low  frequency  sufficies  to  com- 
pletely characterize  the  input  signal  at  all  frequencies.  The 
observed  phase  sensitivity  in  binaural  interaction  can  thus  be 
explained  by  cross- correlation  of  signals  that  are  reconstructed 
from  spectrograms,  provided  that  low-frequency,  tuned-neuron 


responses  are  used  to  solve  for  the  unknown  phase  parameters  in 
the  reconstructed  signals. 

2.  2.  9. 1 Implementation  of  a Two-Dimensional 
Deconvolution  Operation 

Reconstruction  of  an  input  signal  or  its  ambiguity  function 
from  a spectrogram  requires  two-dimensional  deconvolution.  Most 
engineers  think  of  implementing  such  an  operation  by  means  of 
Fourier  transforms.  Approximate  deconvolution  techniques,  how- 
ever, are  not  nearly  so  complicated  (Papoulis,  1972a).  The 
apparent  complexity  of  the  reconstruction  operation,  then,  is  not 
necessarily  a deterrent  to  its  implementation  in  biological  systems. 

Another  approximate  reconstruction  method  has  been  used  in  speech 
vocoding  (Flanagan,  1965). 

Efficient  reconstruction  methods  and  the  accuracy  of  approxi- 
mate methods  may  depend  upon  the  form  of  the  signal  itself.  Are 
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echolocation  signals  designed  so  that  echo  spectrograms  can  be 
easily  transformed  back  into  the  original  echo  data?  Are  some  sig- 
nals (e.g.,  frequency  modulated  "chirps")  more  easily  detected 
than  others  (e.g.,  sinusoids)? 

2.2.9..T  Design  of  Hearing  Aids 

Signal  reconstruction  from  spectrograms  should  be  useful 
for  hearing  aid  design.  Suppose  that  the  auditory  system  actually 
forms  a spectrogram  of  an  input  signal.  A corresponding  spectro- 
gram can  be  synthesized  with  man-made  filters  that  are  as  similar 
as  possible  to  those  of  the  listener.  The  synthesized  spectrogram 
can  then  be  altered  so  .as  to  improve  the  listener's  internal  represen- 
tation of  the  signal.  A new  signal  is  constructed  from  the  altered 
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spectrogram  by  using  the  reconstruction  operations  in  Properties 
8 and  9 of  Appendix  A.  The  new  signal  is  then  presented  to  the 
listener. 

The  philosophy  for  this  kind  of  hearing  aid  design  is  to 
transform  the  signal  into  a realistic  perceptual  space,  to  operate 
upon  the  transformed  signal  in  this  perceptual  space,  and  then  to 
convert  the  result  back  into  an  acoustic  waveform  for  presentation 
to  the  listener.  The  last  step  in  this  process  depends  upon  the 
reconstruction  of  a signal  from  its  spectrogram. 

2.2.9.K  Time  and  Place  Theories  of  Audition 

In  order  to  reconstruct  a signal  from  its  spectrogram,  one 
must  have  a record  of  the  temporal  variation  of  the  envelope  of 
each  filter's  response.  Neither  time  variation  of  the  envelope  nor 
place  (filter  center  frequency)  information  can  be  discarded,  if 
there  is  to  be  sufficient  information  for  signal  reconstruction. 

2.2.10  Summary 

Properties  and  interrelationships  of  various  waveform  descrip- 
tions, i.  e.  , the  complex  analytic  signal  representation,  the  spectro- 
gram, the  time-frequency  energy  density  function,  and  the  ambiguity 
function,  have  been  assembled  in  Appendix  A.  By  combining  several 
of  these  properties,  we  have  found  that  a signal  can  be  reconstructed 
from  its  spectrogram,  except  for  a complex  factor  with  unit  magni- 
tude, exp  (j\).  The  time  invariant  (or  frequency  invariant)  part  of 
a signal's  phase  function  is  then  the  only  information  that  is  lost  when 
the  signal  is  converted  into  a spectrogram.  Variable  bandwidth 
filters  can  be  taken  into  account  by  a nonlinear  transformation  of 
the  frequency  scale  before  a spectrogram  is  formed. 
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Practical  reconstruction  of  a signal  from  its  spectrogram  is 
affected  by  sampling  rates  and  by  additive  noise.  The  spectrogram 
must  usually  be  sampled  at  a much  higher  rate  for  signal  reconstruc- 
tion than  for  an  efficient  representation  of  the  spectrogram  itself. 

The  reconstruction  method  can  be  modified  by  using  a two-dimensional 
filter  that  depends  upon  internal  and  external  noise,  rather  than  the 
deconvolution  operation  that  is  used  for  noise-free  data. 

It  is  obvious  that  a nonideal  detection  process  can  be  per- 
formed by  comparing  a data  spectrogram  with  the  spectrogram  of  a 
known  signal.  For  low'  signal-to-noise  ratio,  the  best  way  to  imple- 
ment this  comparison  is  by  correlation  of  data  and  reference  spectro- 
grams. In  order  for  such  a spectrogram  correlator  to  perform  as 
well  as  a matched  filter,  the  energy  of  the  input  signal  must  be 
increased  by  about  5dB. 

A matched  filter  can  be  applied  to  signals  that  have  been 
reconstructed  from  a spectrogram.  An  equivalent  process  is  to 
integrate  the  product  of  the  data  autoambiguity  function  and  the  con- 
jugated autoambiguity  function  of  the  reference  signal.  Complete 
reconstruction  of  the  input  data  is  therefore  unnecessary  once  the 
data  ambiguity  function  has  been  estimated. 

When  prior  statistical  information  about  a random  signal  is 
available,  an  estimator -correlator  is  often  superior  to  an  ordinary 
energy  detector.  Signals  that  are  estimated  from  noise-corrupted 
spectrograms  are  especially  useful  for  an  e stimator -correlator  con- 
figuration if  the  prior  information  is  given  as  the  ensemble  average 
of  the  envelope  of  the  signal's  autoambiguity  function.  Prior  infor- 
mation of  this  type  is  available  when  a signal  has  been  passed  through 
a time-varying  random  filter  with  known  time-frequency  autocorre- 
lation function. 
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II 

From  the  viewpoint  of  optimum  array  processing,  a spectro- 
gram representation  is  inadequate,  since  it  docs  not  provide  an 
efficient  direction  estimate  if  the  signal  has  narrow  bandwidth.  A 
complete  description  of  the  input  signal  is  necessary  in  this  case. 

I 

Such  a description  can  be  obtained  by  combining  the  spectrogram 
with  a phase  measurement.  The  phase  measurement  is  obtained 
from  the  pre-detected  response  of  one  of  the  filters  that  is  used  to 
construct  the  spectrogram. 

The  above  theoretical  concepts  have  been  compared  with 
some  facts  and  theories  about  hearing  and  animal  ccholocation; 

The  sampling  rates  for  reconstruction  of  an  input  signal 
require  overlapping  neural  or  hair  cell  tuning  curves,  if  these  tuned 
neurons  are  analogous  to  the  filters  and  envelope  detectors  that  are 
used  for  construction  of  a spectrogram.  A fixed  sampling  rate  in 
the  frequency  direction  also  results  in  a maximum  signal  duration 
beyond  which  the  input  signal  cannot  be  reconstructed  and  coherent 
processing  cannot  be  applied.  The  predicted  overlap  of  neural 
tuning  curves  and  a change  in  listener  performance  for  detection  of 
signals  longer  than  200  msec  have  both  been  observed  in  mammalian 
audition. 

Another  interesting  prediction  is  that,  for  negligible  internal 
noise,  tuning  curves  with  narrow  bandwidths  or  sharp,  high  frequency 
cutoffs  are  not  necessary  for  accurate  frequency  discrimination. 

This  prediction  is  relevant  because  tuning  curves  with  sharp  cutoffs 
are  not  always  experimentally  observed. 

For  masking  of  a narrowband  signal  with  narrowband  noise, 
both  spectrogram  correlation  and  signal  reconstruction  operations 
are  associated  with  critical  band  effects.  For  masking  of  a gated 
sinusoid  by  wideband  noise,  the  reconstruction/matched  filter 


53 


hypothesis  predicts  an  increase  in  effective  filter  bandwidth  when 
signal  duration  is  decreased,  and  this  bandwidth  increase  has  been 
experimentally  observed  (although  the  observed  bandwidths  are  too 
wide). 

A natural  consequence  of  the  reconstruction  hypothesis  is 
monaural  insensitivity  to  the  phase  of  a complex,  analytic  waveform. 
Binaural  processing  in  humans,  however,  is  phase  sensitive.  This 
phenomenon  can  only  be  explained  by  assuming  that  at  least  one 
predetected  filter  output  is  available.  This  assumption  is  justified 
for  low  frequency  neural  filters,  since  phase-locked  discharges 
have  been  observed  below  4 to  6 kHz.  A broadband  signal  with  low 
frequency  components  or  a low  frequency  narrowband  signal  can 
thus  be  completely  reconstructed  for  binaural  processing. 

A spectrogram  representation  implies  that  sinusoids  with 
different  envelopes  should  be  distinguishable.  This  envelope  sensi- 
tivity has  been  experimentally  observed. 

Another  consequence  of  signal  reconstruction  is  the  possi- 
bility that  matched  filtering,  rather  than  energy  detection,  can  be 
used  by  a listener  in  a two-alternative,  forced  choice  between  signal 
plus  noise  and  noise  alone.  In  fact,  the  experimental  results  for 
2 AFC  detection  of  a single  tone  seem  to  indicate  an  imperfect 
matched  filter  model,  or  perhaps  an  estimator-correlator  model. 
Human  detection  performance  for  a sum  of  sinusoids  with  different 
frequencies,  however,  seems  to  rule  out  a matched  filter  and 
instead  favors  a spectrogram  correlation  or  spectrogram  estimator- 
correlator  process. 

In  animal  echolocation,  a spectrogram-like  signal  represen- 
tation seems  to  imply  that  detection  is  best  modelled  by  a spectrogram 


correlator  rather  than  by  an  ideal  detector.  We  have  demonstrated, 
however,  that  matched  filtering  is  still  a possibility.  A 5 dB  SNR 
improvement  (or  an  increase  in  maximum  range  by  a factor  of  1.3) 
is  obtained  if  the  ideal  processor  is  implemented  and  the  signal  is 
known  exactly. 

One  of  the  most  exciting  applications  of  the  analysis  is  a 
new  concept  for  the  design  of  hearing  aids.  A hearing  aid  would 
operate  on  the  spectrogram  of  an  input  signal,  and  the  altered 
spectrogram  would  be  transformed  back  into  a new  acoustic  signal 
for  the  benefit  of  a listener. 
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2.2.11  Conclusion 

In  designing  an  efficient  speech  encoder /synthesizer,  a 
hearing  aid,  or  a sonar  that  is  built  to  simulate  animal  echolocation, 
it  is  important  to  know  what  information  is  destroyed  by  the  peripheral 
auditory  system.  This  information  can  then  be  neglected  in  the 
corresponding  man-made  systems  and  can  lead  to  a significant 
reduction  in  complexity. 

Much  of  the  neurophysiological  and  psychoacoustic  data  indi- 
cate that  the  peripheral  auditory  system  can  be  represented  by  a 
spectrogram  synthesizer  with  some  nonlinear  filters  and  probabilistic 
components.  Given  a linear,  deterministic  version  of  this  model, 
we  have  shown  that  the  central  nervous  system  probably  has  suffi- 
cient information  to  reconstruct  the  input  signal,  except  for  a multi- 
plicative constant  with  unit  magnitude  and  unknown  phase. 

Unfortunately,  this  result  does  little  to  simplify  the  design 
of  man-made  devices,  since  it  implies  that  /ery  little  information 
/nay  actually  be  destroyed  by  peripheral  sound  transduction  and 
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neural  encoding  in  mammals.  In  a simulation  of  an  animal  sonar, 
for  example,  a matched  filter  may  still  be  relevant. 

The  results  of  the  analysis,  however,  can  be  viewed  in  a 
more  optimistic  light.  If  a spectrogram  is  indeed  analogous  to  the 
representation  of  a sound  that  is  sent  to  the  brain,  then  signal 
processing  operations  that  are  performed  on  the  spectrogram  should 
be  directly  related  to  sound  perception.  Although  the  concept  of 
operating  on  a spectrogram  is  not  new,  we  have  found  a precisely 
defined  method  for  converting  an  altered  spectrogram  back  into  an 
acoustic  signal  that  can  be  presented  to  a listener.  This  signal 
reconstruction  method  may  lead  to  better  hearing  aid  design.  It 
may  also  lead  to  a new  method  for  auditory  experimentation,  where 
signals  and  maskers  are  synthesized  as  spectrograms,  i.  e.,  in 
terms  of  their  probable  representations  in  the  central  nervous 
system.  Such  a method  should  provide  further  insight  into  the 
detection  and  recognition  of  complex  acoustic  signals  by  the  brain. 
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2.2.  13  Appendii  os  for  Section  2 . ^ 

APPENDIX  A:  PROPER  I IKS  OF  SPECTROGRAMS 
AND  RELATED  FUNCTIONS 


Some  properties  of  time- frequency  energy  density  functions, 
spectrograms,  and  ambiguity  functions  are  listed  below.  Where 
limits  on  integrals  are  omitted,  the  limits  are  (-»,  «o). 


Property  1 (Ackroyd,  P>71). 


S (t,.  f.  ) = c (t,f)  * e (t,  -f) 
uv  1 1 uu  vv 


(A  1 ) 


where  (*)  denotes  two-dimensional  convolution,  i.  e.  , 


Suv<VV 


// 


•\.u"-0  it- 


Property  2 (Ackroyd,  P)71). 

|2 


X (-f,0)‘ 
uv 


e (t,f)*e*  (-t,-f) 

uu  vv 


(A2) 


■II 


v (t,  f)  e*  (tfr,  f-0)  dt  df  . 
UU  vv 


Property  3 (Stutt,  1964;  K iliac /.<*ck,  1968). 


F l*‘uv(t*f):  r'<M  = xuv(r,0) 


uv 


(AM 


where  F je  (t,  f);  r,C>J  denotes  a two-dimensional  Fourier  trans- 


form that  maps  e (t,  0 onto  the  r,0  plane,  i.  e.  , 


II 


, 0.nc-J2"*,tTf,dtdf  . X (r.0l 

uv  uv 
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Property  4. 
From  (6)  - (9), 


| Xuv(tl,£l)l  = Su(t)v*(-t)  (‘V  V 


Suv<tl’fl)  = Xu(t)v*(-t)(_tr  V| 


(A4) 


(A5) 


Since  m ny  properties  of  the  magnitude  - squared  ambiguity  function 
have  been  derived  (Stutt,  1964;  Cook  and  Bernfeld,  1967),  the  above 
identity  leads  to  similar  properties  of  the  spectrogram.  Properties 
5 to  7,  for  example,  are  shared  by  S^v(t,  f)  and  |x^v(t,  f)|^. 

Property  5. 

Smeared  energy  density  spectra  and  time  envelopes  can  be  obtained 
from  marginals  of  the  spectrogram,  i.  e.  , 


/suv<wdt i = / lu,£,l2  Kvl2d 

/suv(W =/lu(,,|2  |v<tr,)|2dt 


(A6) 


(A7) 


These  relations  are  verified  by  substitution  of  (8)  into  the  left-hand 
side  of  (A6),  and  by  substituting  (9)  into  the  left-hand  side  of  (A7). 


Property  6. 


ff 5 uv<VVd,ld‘>  - Eu: 


(A8) 
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where 


u * J |u(f)|2  df  = J|u(t)|2  dt 

v 3 J |V(f)|2  df  = /|v(t)|2  dt 


This  property  states  that  the  volume  under  the  spectrogram  depends 
strictly  upon  the  energy  of  the  signal  and  filter  functions.  The 
volume  is  independent  of  the  form  of  these  functions.  To  verify 
(A8),  integrate  (Ah)  with  respect  to  f^,  or  integrate  (A7)  with  respect 


Propertv  7. 


S (t  , f ) < E E 

uv  1 1 U V 


(A10) 


with  equality  if  and  only  if  there  exist  values  of  t^  and  f^  such  that 


V (f-fj ) exp  (jZwftj ) = U*(f) 


(All) 


The  magnitude  of  the  spectrogram  never  exceeds  the  product  E E , 

U V 

and  the  upper  bound  is  only  obtained  if  one  of  the  filters  that  is  used 
to  form  the  spectrogram  is  matched  to  the  signal.  Eq.  (A10)  is 
obtained  by  applying  the  Schwarz  inequality  to  S^v(tj,fj),  as  defined 
by  (8). 


Property  8. 


F|suv(t.f);  t,+  } = Xuu(r,<*)Xvv(-r,*)  (A12) 

or 

xuu(r»  0)  = F{Suv(t,f);  r,*|/  Xvv(-r»  . (A13) 

Eqs.  (A12)  and  (A13)  follow  directly  from  (Al)  and  (A3).  Eq.  (A13) 
implies  that,  in  the  absence  of  noise,  X ^(r,  0)  can  be  obtained  from 
a spectrogram,  if  the  filter  function  v(t)  is  known.  The  significance 
of  this  result  is  enhanced  by  the  following  property  and  by  Property  14. 


Property  9 (Siebert,  1958;  Titlebaum  and  DeClaris,  1966). 

X (r,  <f>)  = X (r,  <p)  if  and  only  if 
ulul  u2u2 

Uj(t)  = u^t)  exp  (j A) , where  A.  is  real. 


This  observation  was  originally  derived  from  the  separation 
equation 


(r-t,  0)  exp  (j27rt$)  d(f>  = u(t)  vT'(r)  . 


(A  14) 


The  result  can  also  be  derived  from  Property  3 by  taking 
F {X^^T,  <f>);  t,  f } = e^^ft,  f),  and  by  forming 


(t,  f)  exp  [ j277f(t-r)] 


df  = u(t)  u *(r)  . 


(A  1 5 ) 
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In  either  case,  |u(ro)|  is  obtained  by  setting  t = T - r , where  tq  is 

a constant.  Then  u(*)^u  (T0)/|u(ro)|J  is  the  desired  waveform 

u(t),  multiplied  by  a complex  constant  with  unity  magnitude. 

It  follows  that  one  can  obtain  u(t)  exp  (jX)  from  X ( t,  0)  or 

e (t,f),  where  X is  a real,  unknown  constant, 
uu 


Property  10  (Strutt,  196 4). 

The  counterpart  of  Property  8 for  the  magnitude- squared  ambiguity 


function  is 


l.  e.  , 


F Xuv<- 


//  Kv'-H 


.<t>) |i. 2;  t.fj  = 


= X (t,f)X*  (t,f) 
uu  vv 


( A 1 6 ) 


2 -j2jr(fr  + t0) 


drd<>  = X (t,  f)  Xs"  (t,f) 
uu  vv 


In  particular. 


||Xuo(-r.«,|2;t.f|  = |xuu(t.£)|2. 


i.  e.  , the  magnitude-squared  autoambiguity  function  is  its  own 

Fourier  transform.  Eq.  (A2)  can  be  obtained  directly  from  (A3) 
and  (Al6)  by  using  the  fact  that  multiplication  of  two  functions  is  the 
same  as  convolution  of  their  Fourier  transforms. 


P rope  r:  II. 

jjt 

Multipl>  -"g  both  sides  of  (A12)  by  ^),  we  have 

* jvUI;  r-*}Xw‘(-T-W  = X„u(r^»|Xvv(-r’H: 


Using  Property  1 0 and  (A3), 


*'vv’<-t,f>  = 'uu(t'r|*  Xvv(t-f)  2 


(A  1 7) 


The  abov  e relation  can  also  be  obtained  from  (Al)  and  (A2),  and 
this  alternate  derivation  leads  to  a generalized  version  of  (A17): 

S (t,f)*e*  (-t,f)  = e (t,  f)  * |X  (t,f)|2 

UjVj  V2V2  UjUj  | vjv2'  '| 


S (t,f)ev  (t-T,<p-f)  dtdf  = 
ulvl  V2V2 


//v1(M,IV2,r-‘-*-,)f 


(A  1 8) 


Property  12. 

Two  different  short-time  spectral  histories  can  be  compared  by 
means  of  the  cross-correlation  operation 


S (t,  f)  S 

JJ  U1V1  U2 


(t+r,  f+0)  dtdf  = S *(-t, -f)  * S (t,f) 


ulvl 


U2V2 


= e (-t. -f)  * e *(-t,f)*e  <t.  f)  * e (t, -f) 

U1U1  V1V1  U2U2  V2V2 


lx  (-t.f)l2  * lx  (t,  -f) 

I u-^u^  I | V2Vj 

//|X  (t,f)|2  |X  (r 

JJ I U2U1  1 I V2V1 


+t,  f - CH  dt  df  . 


(A1  9) 


The  above  sequence  of  equalities  made  use  of  (Al)  and  (A2).  Equa- 
tion (A19)  is  similar  to  a result  for  wideband  auto-ambiguity  functions 
that  was  published  by  Flaska  (1976),  with  credit  to  C.  E.  Persons. 


Property  13. 

Another  way  to  compare  two  different  signals  is  to  integrate  the 
product  of  their  time-frequency  energy  density  functions.  By  using 
(6)  and  Parseval's  theorem,  it  is  easy  to  prove  that 


//■ 


e (t,f)e*  (t.f)dtdf 
ulvl  U2V2 


(t)  u*(t)  dt 


h 


*(t)  v2(t)  dt  . 


(A20) 


An  important  special  case  of  (A20)  is  obtained  by  letting  v^t)  = 
u^t)  and  v2(t)  = u2<t)  = Uj(t-T)  exp  (j2rr0t).  In  this  case,  (A20) 


becomes 


ffe  (t,  f)  e*  , (t.f)dtdf  = |X  (r,0)|‘ 

JJ  U1U1  U2U2  ' U1U1 


(A21) 


n 


I 


Property  14. 

By  using  Property  3 and  applying  Parseval's  theorem,  (A20)  can  be 
written 


// 


X „ M K „ (7.0)  dp 
ulvl  U2V2 


= y"Ul(t>  u2  (t>  dt  /V v2 


(t)  dt 


(A22) 


The  above  equation  can  also  be  verified  by  direct  substitution  of 
(7)  into  the  left-hand  side.  Once  again,  if  v^(t)  = u^ft)  and  v?(t)  = 
U2 (t)  = Uj(t-T^)  exP  (j2  7r 0 t ),  we  have 


// 


X (t,  p)  X*  (7,0)  dr  d0  = X (7,0)". 
u i U J u2u2  ulul  11 


Similarly,  if  Vj(t)  = u}  (t)  and  v,(t)  = u,(t)  in  (A22),  we  have 

0)  Xu2u2(T’  0)  dT  d0  = (t)  dt|2  * (A23> 


The  significance  of  (A23)  is  that  two  signals  can  be  correlated  by 
operating  upon  their  autoambiguity  functions,  rather  than  upon  the 
signals  themselves.  Eq.  (A23)  can  also  be  obtained  from  Stutt's 
Fourier  transform  equation,  (A  1 6 ). 

Property  15. 

Because  the  time-window  version  of  the  spectrogram  is  commonly 
used  in  spectrogram  representations  (Flanagan,  1965)  and  in 


| 

•! 


I 
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auditory  theory  (Hartmann,  1978),  it  is  useful  to  relate  the  time- 

T 

window  spectrogram  S^^ftj.fj)  to  the  frequency  window  spectro- 
gram that  is  defined  in  (8)  and  (91.  The  time-window  spectrogram 
is  obtained  by  multiplying  the  data  u(t)  by  a delayed  window  function 
w(t-tj)  and  by  taking  the  Fourier  transform  of  the  resulting 
product,  i.  e.  , 


/W  = \ J > exP  H2”*!1)  dt|"  • 


(A24) 


By  changing  variables  in  (9),  the  frequency -window  (bank- of- filte rs  ) 
spectrogram  can  be  written 


Suv'VV  = 


00 

J j ti(t)  v(tj-t)  exp  (-jZfffjt)  dt|‘ 


(A25) 


Comparison  of  (A24)  and  (A25)  indicates  that 


S (t  , f ) = S (t  , f ) if 
uw  11  uv  1 1 


w(t)  = v ( - 1 ) , 


(A26) 


where  v(t)  is  the  impulse  response  of  the  basic  filter  function  that 
is  used  to  construct  the  frequency- window  spectrogram. 


APPENDIX  B:  ESTIMATION  PROM  A NOISY  SPECTROGRAM 

The  problem  is  to  derive  a filter  for  estimating  a signal's 
ambiguity  function  from  a noise-corrupted  spectrogram. 

From  Figure  1,  we  have  an  estimate 


\U(T,0)  = 0)|[Xuu(r,0)  + Ne(r,  0)jXvv(-r,  0)  +N.(r,0)J  . 

(Bl) 


The  mean-square  error  that  is  associated  with  this  estimate  is 

MSE(€)  = e|  jJ|Xuu-H[(XuutNe)Xvv+N.]|2dId(.|  (B2| 


where 


H(r,  0)  = Hq(t,  0)  + e rj (t,  0) 


(B3) 


and  tj(t,  0)  is  an  arbitrary,  piecewise  - smooth  function. 

Define 

B(t,  0)  £ [ Xuu(t,  0)  + Ne(r,  0)]  Xvv(-t,  0)  + N.(r,  0)  , (B4) 

so  that 

|xuu  ' BH|2  dTd0|-  (B5) 

In  (B2)  and  (B5),  the  expectation  operator  is  an  ensemble  average 
with  respect  to  N^,  N.  and  all  admissible  input  signals,  u(t). 


1 
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Differentiating  MSE(f)  with  respect  to  c in  (B5)  yields 


9MSE(e)/9e  =2  Re 


//  [HoE{lB|2} 


{x  B'l 

( uu  ) 


d t d0 


9MSE(e  )/9e 


for  any  tj(t,  0)  if 


H.  = E I X B*J  / E 
0 ( uu  f 


{M2} 


|Xuu|T’',)|2  Xv>T-»> 

Xvv('T’0)|  [|Xuu(T'0)|  +Pe(T'0)J  + Pi<T’ 


where  it  has  been  assumed  that 

E{Ne(r,0)|  = 0 , 

e{n.(t,  0)|  = 0 , 
e|n*  (T,  0)  N.(T,  0)|  = 


E{|Ne(r,  0)| = Pe(T,  0) 
E j|N.(r,  0)|2|  = P.(r,0) 


E{lx»u(H2b  Ku<t-4 


For  given  noise  distributions  P (t,  0)  and  P.(T,0),  what  is  the  best 

e 1 

analyzing  filter  V(f)?  If  P.(T,  0)  = 0,  MSEQ  is  independent  of  V(f), 

but  if  P.(T,  0)  * 0,  we  can  find  |X  (-T,  0)|2  such  that  MSE„  is  mini- 
i vv  0 

mized.  In  order  to  obtain  a solution  to  this  problem,  we  add  a con- 

2 2 

straint  that  the  volume  of  | Xvv(- r,  0)|  equal  E “ as  in  (20).  Then 


The  Euler  condition  is 


Since  lx  I >0,  we  have  (Shnidman,  1975) 


This  solution  represents  a minimum  value  of  MSE^,  since  the  second 

variation  of  the  functional  with  respect  to  lx  |2  is  positive  if  P.  > 0. 

I vvl  1 
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APPENDIX  C:  DETECTION  MY  CORRELATION 
OF  SPEC TK OGHAMS 


l.  Detector  Configuration 

In  order  to  derive*  a detector  configuration,  we  must  first 
obtain  the  probability  density  function  (pdf)  of  a noisy  spectro- 
gram. The  data  is  assumed  to  consist  of  signal  plus  white  Gaussian 
noise  with  power  spectral  density  N /l.  The  data  is  first  passed 

through  a bank  of  linear  filters  with  transfer  functions  V(f  - f ), 

th  .1 

j = 1 1 •••*  M.  The  real  part  of  the  j filter  output  is  Gaussian 

(Davenport  and  Hoot,  l‘)S8)  with  power  spectral  density  (N  /<!) 

■ .2  o 

j V (f  - f.  )|  . The  variance  of  this  process  is 

^ 8 <No/2)Kv-  (Cl) 

where  was  defined  in  ( A *l ) - 

For  Gaussian  input  noise,  the  real  and  imaginary  parts  of 
each  filter  output  are  statistically  independent  and  Gaussian.  This 
statement  follows  from  tin*  following  three  observations; 

(i)  l. inear  filtering  preserves  Gaussianity. 

(ii)  Uncorrelated  Gaussian  random  variables  are 
stat  i st  ically  independent . 

(i»i)  For  any  random  process  x(t)  that  has  a Hilbert 
transform  x(t). 


K J x(t)  x (t) } 0 , 


i *•.,  \(t)  and  x (t ) are  uncorrelated  (Papoulis, 

I »«  M 


For  non-Gaussian  input  noise,  the  above  result  will  still  hold  if  we 
can  modify  observation  (i)  to  read  "linear  filtering  causes  non- 
Gaussian  signals  to  become  Gaussian."  This  modified  statement  is 
approximately  true  of  narrowband  filters  (Papoulis,  1972b)  and  wide- 
band filters  with  large  time-bandwidth  product  (Altes,  1975). 

Let  x.j  denote  the  response  x(t)  of  the  jth  filter  at  time  i, 
and  let  y be  a sample  at  time  i of  the  Hilbert  transform  of  the  jth 
filter  response.  The  envelope  of  the  filter  response  is  formed  as 
in  (3),  and  we  are  interested  in  the  pdf  of  the  sc^a^ed  envelope, 
which  is  a sample  of  the  spectrogram: 


Z(t.,  L)  = z..  - (x.  7 ^2  )2  + (y../^2  )2.  (C3 

The  random  variables  x = X.7J7  and  y = y A/T  are  Gaussian  and 
independent,  with  joint  pdf 

Pxy(x'y)  = (^V’expj-flx-  r,])2  + (y  - r,2)2J/o2}  (C4 


where  the  noise-free  spectrogram  sample  is 


2 2 
s..  ~ T)  + i j 

ij  1 2 


To  find  the  pdf  of  Z ( t . , f ) in  (C3),  we  can  use  some  v ell- 
known  results.  First,  let 

. 2 2vl 
w = (x  + y ) . 


From  Papoulis  (1965),  p.  196,  and  (C4)  and  (C5), 


(C6) 


2 r 2 2 1 2 

p (vv ) = (Zw/p*" ) exp  | - (w  +s..)/cj  II  (2w  s..  /o  ) 

W L 1J  J O lj 


for  w s 0,  and  f^(w)  = 0 for  w < 0.  Since 


z . . = w 

ij 


we  have,  from  Papoulis  (1965),  p.  129, 

PVV  W * " 2exp  ['“y  + Sij,/t’2] 

sij  l°l'i  • (CV 

When  the  signal  is  absent,  s..  = 0 and 

ij 

Pzij(Zij|H0)  = °~Z  exP  zij/ff2)  • (C8) 

In  (C7)  and  (C8),  I ( • ) is  a modified  Bessel  function  of  zero  order, 
o 

Eq.  (C7)  is  the  same  as  the  pdf  obtained  by  McGill  (1Q68) 
from  the  Rayleigh-Rice  distribution  (C6).  Rice's  derivation  of  (C6), 
however,  is  based  upon  the  assumption  that  the  filter  responses  are 
narrowband  (Rice,  1954).  Eq.  (C2)  has  allowed  us  to  discard  the 
narrowband  assumption. 

Eqs.  (C7)  and  C8)  specify  the  pdf  for  one  sample  of  the 
spectrogram.  Assuming  that  samples  are  sufficiently  far  apart  to 
be  statistically  independent  when  only  noise  is  present,  we  can 
construct  the  pdf  of  an  entire  spectrogram  by  forming  the  products 
of  the  pdf's  of  the  individual,  independent  samples,  and  the  likeli- 
hood ratio  is 


/ N 

M A 

N 

= exp  ( - £ 

E s .//) 
j=l  1J  / 

11 

\ i = l 

i = l 

\ N M / 

) £!  II  '»  ) • <c91 


A locally  optimum  detector  for  very  small  signal-to-noise 
ratio  (SNR)  can  be  obtained  from  (C9).  The  locally  optimum  detec- 
tion statistic  is  determined  by  calculating  9fnA/90  at  0=0, 
where  0 is  an  SNR  parameter  (Capon,  1961;  Middleton,  1966). 


If  the  signal  u(t)  is  amplified  by  a factor  A,  i.  e.  , if  the  signal  is 

2 2 
written  A u(t),  then  s..  in  (C9)  is  replaced  by  A s...  The  SNR 

'}  ij 

parameter  0 can  then  be  defined  as 


and 


2.2 


+ (0  s. . z.  Jo  ) /4  + . . . 


1! 


(Cl  1) 
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e = 
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The  locally  optimum  detector  compares  the  quantity 


din  A 

~df 


0=0 


N M 


E EK  + *ijhj/,r2) 


(C  1 2 ) 


with  a threshold.  If  the  signal  u(t)  and  the  filter  function  v(t) 
have  unit  energy,  then  from  (A8)  the  term 


(t,  f)  dt  df  = E E 
u v 


(C 1 3) 


is  the  same  for  all  situations,  and  this  term  can  be  incorporated 

2 

into  the  threshold.  The  threshold  can  also  be  multiplied  by  a , 
and  the  locally  optimum  test  for  distinguishing  between  signal  plus 
noise  (Hj)  or  noise  alone  (H^)  is 

N M Hj 


E E 

i=l  j=l 


s..  z. . 


(Cl  4) 


where  y is  a threshold. 

The  locally  optimum  detector  for  very  small  SNR  correlates 
sampled  versions  of  the  noise-free  signal  spectrogram  Suv(t,  f)  and 
the  data  spectrogram,  Z^ft,  f). 
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APPENDIX  D:  PHASE  INFORMATION  FOR 
DIRECTION  ESTIMATES 


Destruction  of  Relative  Phase  Information  in  Narrowband 
Signals 


From  Property  9 in  Appendix  A,  a signal  u^  (t)  that  is  recon 
structed  from  its  spectrogram  is  multiplied  by  a constant 


where  r is  a time  that  can  be  chosen  by  the  receiver, 
o ’ 

Let  u (t)  be  a narrowband  signal  from  sensor  No.  1,  where 


From  (Dl),  the  phase  of  the  reconstructed  version  of  u (t)  is 


Let  u (t)  be  a delayed  version  of  u (t)  that  is  observed  at 


If  an  estimate  of  u.,(t)  is  reconstructed  from  its  spectrogram,  its 
phase  is 


The  relative  delay  parameter  r , which  should  be  used  to  provide 
accurate  narrowband  direction  estimates,  is  eliminated  from  6^(t) 
by  the  reconstruction  process,  and  8 (t)  = 8 (t)  from  (D2)  and  (D3). 

1 L 

Narrowband  signals  that  have  been  reconstructed  from  spectrograms 
therefore  cannot  provide  accurate  relative  delay  estimates  for 
direction  measurement. 

B.  Sufficient  Phase  Information  for  Complete  Reconstruction 
of  a Signal  from  Its  Spectrogram 

Two-dimensional  deconvolution,  together  with  the  operations 
in  Property  9 of  Appendix  A,  yield  a signal  estimate 

u(t)  = u(t)  exp  (jX) 

with  Fourier  transform 


U(f)  = U(f)  exp  (j\)  . 


The  contribution  of  the  filter  to  the  spectrogram  is 


S(t,  f ) = |a.(t)| 


(D4) 


where 


‘3(t)  ' / 


U(f)  V(f-T)  exp  (j2 7rf t)  df  , 


and  where  V(f)  is  the  known  transfer  function  of  a low  pass  filter. 

2 

If  a^(t)  as  well  as  |a^(t)|  can  be  observed,  then  the  Fourier 
transform 


Ajff)  = U(f)V(f-f.) 


(D5) 
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can  be  computed.  The  signal  spectrum  sampled  at  frequency  f.  can 
be  determined  from  (D5),  i.  e.  , 

U(f  ) = A (f  ) / V (0)  . (D6) 

J J J 

Combining  (D4)  and  (D6), 

U (f.)  exp  (- j A)  = A.  (f.)  / V(0) 

J J J 

and 

exp  (jX)  = U (f.)  V(0)  / Aj  (f  ) . (D7) 

By  observing  the  phase,  as  well  as  the  amplitude,  at  the  out- 
put of  a single  filter,  we  can  deduce  the  value  of  the  parameter  X 
that  is  added  to  the  phase  of  the  reconstructed  spectrum  U(f  ). 

j 

Knowledge  of  X allows  us  to  completely  determine  the  signal  u(t) 
from  its  spectrogram.  A complete  reconstruction  of  u(t)  is  useful 
for  direction  estimation  with  narrowband  signals. 


I 

i 

l 

i 

| 


APPENDIX  E:  CRITICAL  BAND  EFFECTS 

A.  Reconstruction  Error  for  Narrowband  Signals  in  Narrowband 
Noise 

Suppose  that  P.(t,  0)  in  (BIO)  is  set  equal  to  a small,  non- 
zero constant.  The  mean  square  error  in  the  reconstructed  signal 

i |2 

will  remain  small  if  0)  P (r,  0)  is  small,  but  MSE  will 

1 v 1 e 0 

increase  as  |X  (t,  0)1  P (r,0)  becomes  larger.  This  concept  can 
i vv  i e 

be  exploited  to  heuristically  predict  the  behavior  of  the  estimator. 

In  particular,  we  shall  consider  a narrowband  signal  with  center 
frequency  f and  narrowband  noise  with  center  frequency  f^.  After 
the  transformation  (10),  the  signal  has  center  frequency  £n  and 
the  noise  has  center  frequency  £n  f^. 

From  (23), 


p (r,  0)  = e/  lx  +x  +x  |2l 

e ( | nn  un  nul  ) 

=E{ixJ2+ixJ2+K»r 


+ 2Re  (X  X +X  X +X  X )l  . (El) 
nn  un  nn  nu  un  nu  / ' ' 

2 

The  half-power,  constant-magnitude  contour  of  |Xvv(r,  0)j  is  approxi- 
mately an  ellipse  on  the  (r-0)  plane  (Cook  and  Bernfeld,  1967)  with 

0-width  equal  to  T * and  r- width  equal  to  B and  the  ellipse 

* v v 

is  centered  at  (t,  0)  = (0,0).  The  half-power,  constant- magnitude 

.2 

contour  of  X ^(t,  0)|  is  approximately  an  ellipse  centered  at 

(r,  0)  = (0,0)  with  0-width  equal  to  T^  1 and  r-width  equal  to  B^  , 

where  B and  T are  the  bandwidth  and  duration  of  the  narrowband 
n n 

noise  sample  n(t).  The  half-power,  constant  magnitude  contour  of 


■ 2 

|X  (t,0)  is  approximately  an  ellipse  centered  at  (r,0)  = (0,  fnf 
i un  1 i Vi 

- fnf  ),  with  0- width  equal  to  (T  T ) 2 and  t -width  equal  to 
n i n u ^ 

(B  B ) 2.  The  half-power,  constant-magnitude  contour  of 

i n U l2 

X (t,  0)  is  approximately  an  ellipse  centered  at  ( t,  0)  = (0,  In  f 
I nu  ' j j n 

" with  -width  (T  T ) 2 and  r -width  (B  B ) 2 . 

u 2 n u n u 

The  product  lx  (r,  $)|  P (r,  <*>)  becomes  significantly  larger  when 

2 vV  2 e ^ 2 

the  |^un|  and  |Xnu|  ellipses  begin  to  overlap  with  the  |X  | 

ellipse.  Assuming  that  T^  = T , i.e.,  that  the  signal  and  noise 

sample  functions  have  the  same  duration,  overlap  occurs  when 


fnf  - fnf  < (T  " 1 + T ' 1 )/2  . 
n u v u 


-1. 


* 1 1 2 

Assuming  that  T » T and  B =*  T , the  product  lx  P 
u v v v r | vv|  e 

becomes  larger  when 


max  (f  /f  , f /f  ) < exp  (B  / 2) 

u n n u v 


(E2) 


The  above  heuristic  result  suggests  that  the  noise  power  in 

the  reconstructed  data  (MSE.)  grows  monotonically  as  f /f  -»  1 , 

0 n u 

provided  that  (E2)  is  satisfied.  For  example,  the  detectability  of 
a reconstructed  narrowband  signal  should  remain  constant  as  the 
center  frequency  i of  the  narrowband  noise  approaches  f from 


n 


u 


below,  until  f =f  exp(-B  /2).  For  f >f  exp(-B  / 2 ),  the  detect- 
nur  v nu  v 

ability  should  be  degraded  because  of  increasing  noise  power  in  the 

reconstructed  process,  until  f = f . This  interpretation  is 

n u 

descriptive  of  a critical  band  effect  for  narrowband  signals  that  are 
masked  by  narrowband  noise. 
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B.  Sienal-to-Noise  Ratio  in  a Spectrogram  Correlator 


A spectrogram  correlation  operation  can  give  rise  to  a 
critical  band  effect  when  a narrowband  signal  is  masked  by  narrow- 
band  noise.  We  shall  use  the  following  definitions: 


pu(f) 


power  spectral  density  of  the  signal,  which  is  a 
narrowband  random  process 


pn(f) 

S (t.  f) 
uv 


s (t,  f ) 

nv 

S (t,f) 


= power  spectral  density  of  noise 
= signal  spectrogram 


= noise  spectrogram 


= spectrogram  of  a reference  signal,  u (t). 

H 


Although  the  actual  signal  is  modelled  as  a random  process;  we 

assume  that  the  sinusoidal  reference  signal  u (t)  is  not  random, 

H 

and  it  has  Fourier  transform  U (f). 

H 

Cross-correlation  of  signal  and  reference  spectrograms 
products  an  expected  signal  response 


= E ( ffs  (t,f)  S (t,  f)  dt  df  } . 

( JJ  UV  UttV  ) 


Cross -correlation  of  noise  and  reference  spectrograms  produces 
an  expected  noise  response 

r = eJ[[s  (t,f)S  (t,  f)  dt  df  \ . 
n IJJ  nv  u„v  J 


Signal-to-noise  ratio  at  the  detector  output  is  then 


SNR  = r /r  = 

s n 


oo 

//E{S SuHv(,-f)dtdf 

— OO 

no 

//E{S„v|l'f>}  SuHv(t-«dtdf 


where 


expected  signal  power  from  a filter  with 
transfer  function  V(x-f) 


oo 

■ /p» 


(x)  | V (x  - f ) | dx 


oo 

EKv(t'£)l  = 


I V(x  - f )|  dx 


S (t,  f) 

uHv 


OO 

/"  H 


(x)  V (x  f)  ej27rxt  dx 


88 


It  follows  that 


°°r  oo  “1  r «o 

rS  = / |pu(*,)|v(*1-f)|2d*1  f SUHV(.,f)d.  df.  ,E7> 

-OO  (_-00  J L "«0 

o°  “ oo  i r 00 

r„  = / / p„<*i>|v<*i-i>|2dx  f SUHV,t.f)d.  df. 

-00  L -°°  l_  -00 


where,  from  Property  5 (A6)  in  Appendix  A, 


OO  OO 

/ SuHv(,'£)dt  = / |UH(X2,|2  |V,X2-f)|2d*2  • (E9) 


For  narrowband  signal  and  noise  processes,  we  have 


P (f)  « P ef6(ef  - f ) 
u u u 

p„(f,  - Pn.f6,ef-fn, 

|UH(«|2  = p„«£d(ef-fH) 


(E10) 


where  the  narrowband  spectral  representations  P 6(f-f  ), 

P 6(f  - f ),  and  PT6(f-fTT)  have  been  pre- distorted  by  the  trans - 
n n H H 1 

formation  in  (10).  Substituting  (E9)  and  (E10)  into  (E7)  and  (E8), 


oo  r oo 

Vh  / /S«Vf u 


()  iVdogXj  - f)|  dXj 


oo 

. 0 


) |V(logx2  - f)|  dx2  df 


c* 

PuPH  / lvn°gfu-f)|2  iVOog^-f)!2  df 


(Ell) 


at 

Tn  = PnPH  j{V  (logfn-f)!2  |V(l°gfH- f)i2  df  (El 2) 


Assuming  that  f = f , i.  e.  , that  the  center  frequency  of 
H u 

the  signal  process  has  been  correctly  hypothesized,  we  have 


V (f)|  df 


SNR  = r /r  = — 
s n P 


OO 

J IV(f)|2  v[f+log(fu/fn)] 


(El  3 ) 


SNR  is  minimized  if  the  narrowband  masking  or  noise  process  is 

such  that  f = f . For  a filter  V(f)  with  bandwidth  B , the  noise 
n u v 

has  no  effect  if 


Ifn(f  /f  ))  > B 
u n v 


<Ei4) 
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i . e . , if 


f li  > c (El  5) 

n u 

or 

f /f  < 1/c  (E16) 

n u 

•where  c = exp  (B^). 

The  SNR  results  for  a narrowband  signal  in  narrowband 
noise  suggest  that  the  filters  in  a spectrogram  correlator  can  be 
characterized  by  measuring  detection  performance  as  a function  of 
frequency  difference  between  signal  and  noise.  This  method  is  well 
known  to  psychoacousticians,  and  (E14)  — (E16)  describe  a critical 
band  effect. 


APPENDIX  F:  SLOPE  OF  THE  P(C)  VERSUS  SNR 
CURVE  FOR  A 2AFC  EXPERIMENT 


This  appendix  is  concerned  with  the  interpretation  of  data 
from  a two-alternative  forced  choice  (2AFC)  detection  task.  In  par- 
ticular, probability  of  correct  response,  P(C),  as  a function  of 
signal  energy,  E , is  to  be  compared  for  a matched  filter  and  an 
energy  detector.  It  will  be  shown  that,  for  a given  value  of  P(C), 
the  slope  of  the  P(C)  versus  E^  curve  is  larger  for  the  matched 
filter  case.  The  increase  in  slope  depends  upon  the  number  of 
degrees  of  freedom,  K,  in  the  energy  detector.  This  dependence 
is  to  be  expected,  since  K = 2 corresponds  to  envelope  detection 
at  the  output  of  a matched  filter. 

The  reason  for  our  interest  in  the  slope  of  the  P(C)  versus 
energy  curve  is  that  a best  fit  to  the  measured  psychometric  function 
apparantly  has  steeper  slope  than  the  curve  that  is  predicted  on  the 
basis  of  an  energy  detector.  This  disparity  is  clearly  shown  in 
Figure  8-3  of  Green  and  Swets  (1966).  We  suspect  that  this  steeper 
slope  is  indicative  of  a matched  filtering  mechanism. 

For  a sinusoidal  signal  with  fixed  duration  and  a fixed 
internal  filter  bandwidth,  K is  fixed.  Signal- to-noise  ratio 
(2  E /N  ) is  varied  by  changing  the  power  of  the  signal.  When  the 
signal  is  absent,  the  energy  detector  output  is  represented  by  a 
random  variable  x that  has  a chi-square  distribution  with  mean  K 
and  variance  2K.  When  the  signal  is  present,  the  response  of  the 
energy  detector  is  a random  variable  y that  has  a noncentral  chi- 
square  distribution  with  mean  K + 2E  /N  and  variance  2K  + 8 E /N  , 
where  E^  is  signal  energy  and  Nq/2  is  noise  power  spectral  density. 
If  K 10,  Green  and  Swets  (1966)  contend  that  the  chi-square  and 
noncentral  chi-square  distributions  are  approximately  normal. 
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In  this  case,  the  distribution  of  z = y - x is  Gaussian  with  mean 

m„_  = 2 E /N  and  variance  o__2  = 4K  + 8 E /N  . In  a 2AFC 
ED  u o ED  u o 

procedure,  the  probability  of  a correct  decision  is  the  probability 
that  y - x > 0,  i,  e.  , 


/a 

MF  MF 


For  a given  value  of  P(C),  h = *'  P(C)  . For  example,  if 
PED(C)  = PMF(C)  = °-75,  then 

hED  = hMF  = °-68-  «F3» 


For  P(C)  = 0.  75,  we  shall  calculate  the  ratio  r of  the 

slopes  of  PW1;,(C)  and  P (C),  as  functions  of  E , i.e., 

Mr  ED  u 


dPMF,C,/dEu 

dPED(C)/dEu 

[d*,hMF,/dhMF] 

dhMF/dE 

ld*'W/dhED] 

dhED/dE 

E = E 
u MF 


E = E__ 
u ED 


(F4) 


From  (F3),  we  have 


= (d  hxx t7 d E 

MF  u 


» ' |dhED/dEu 


E = E.,„ 
u MF 


E = E 
u ED 


where 


hw_  = m /<j.  __ 
MF  MF  MF 


E /N 
u o 


= 0.68  (F6) 


E =Ewt^ 

u MF 


hED  = mED/aED  = (Eu/No)/,K  + 2Eu/No)i  * °' 68  ' <F7> 


L 


It  is  easy  to  show  that  r > l.  Equating  (F6)  and  (F7),  we 


which  means  that  the  matched  filter  curve  always  lies  to  the  left  of 
the  energy  detector  curve.  This  displacement  of  the  P(C)  versus 
2E^/No  curve  could  ordinarily  be  used  to  differentiate  between  a 
matched  filter  and  an  energy  detector,  but  one  can  argue  that 
2 Eu/Nq  cannot  be  directly  measured  due  to  internal  noise.  From 
(F9),  it  is  apparent  that 


(El  0) 


It  is  also  obvious  that 


K + EED/No  < K+2EED/No  • <F1» 

From  (FI 0)  and  (FI  1), 

*'/EMF/No’  <K  + EED/No>  < <K  + 2Eed/No,3/2  • <F12> 

Comparing  (FI  2)  to  (F8),  we  see  that 

r > 1 , (FI  3) 

which  means  that  the  slope  of  the  P(C)  versus  curve,  for  any 
given  value  of  P(C),  is  steeper  for  the  matched  filter  than  for  the 
energy  detector.  The  actual  value  of  r is  dependent  upon  the  spe- 
cific values  of  K and  P(C). 

Equation  (F8)  was  obtained  under  the  assumption  that  K i )0, 
since  the  chi-square  distributions  were  approximated  by  normal  dis- 
tributions. If  K >>  1 and  K>>2E  /N  , then  (F6)  and  (F7)  become 

u o 

E.  .„/N  = (0.68)2 

MF  o 

i 

E /N  = 0.68  K* 

ED  o 

and  (F8)  becomes 

r = 0.735  K*  . (FI 4) 

Since  r in  (FI 4)  increases  with  K,  we  shall  choose  K = 10  to 
obtain  the  least  value  of  r that  is  allowed  under  the  assumption 
that  K 2 10.  For  K = 10,  (F6)  and  (F7)  give 
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EMF/No  = <0'68> 


E /N  = 2.66  . 
ED  o 


Substituting  these  results  into  (F8),  we  obtain 


r = 3.  5 


(FI  5) 


A best  fit  to  the  data  in  Figure  8-3  of  Green  and  Swets  (1966) 
indicates  that  the  slope  of  the  data  curve  divided  by  the  slope  of  the 
energy  detection  curve  at  P(C)  = 0.  75  is 


1.5  . 


(F16) 


The  slope  ratio  r for  a matched  filter  relative  to  an  energy  detector 
with  ten  or  more  degrees  of  freedom  is  therefore  greater  than  the 
actual  slope  ratio  r'  . that  is  observed  in  the  data.  Although  a 
best  fit  to  the  data  seems  to  indicate  a receiver  that  is  better  than 
an  energy  detector,  this  receiver  is  not  as  good  as  a perfect  matched 
filter,  if  K 2 10  in  the  energy  detector.  A similar  conclusion  was 
drawn  by  Taylor  and  Forbes  (1969)  as  the  result  of  a monaural  detec- 
tion experiment  with  the  noise-free  signal  presented  simultaneously  to 
the  other  ear.  Perhaps  the  actual  detector  is  an  imperfect  matched 
filter  (based  upon  an  imperfectly  reconstructed  signal)  or  an 
estimator- correlator. 

Incidentally,  the  relative  performance  of  the  matched  filter 
and  energy  detector  for  the  2AFC  task  is  very  different  from  the 
result  in  Figure  3,  which  was  not  based  upon  a forced  choice 
experiment.  In  Figure  3,  it  was  also  assumed  that  power  is  fixed 
and  F^/Nq  is  varied  by  changing  signal  duration,  i.  e.  , by  varying  K. 
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For  the  2AFC  case  with  variable  power  and  fixed  K,  where 

2EED/No  <<:  K*  (F6)  and  (F7)  indicate  that  the  same  performance 
is  obtained  when 


/ E /N 
MF  o 


= eed/in.k!| 


or 


® = snred/snrmf  =n^k2/snr 


MF 


(FI  7) 


For  a 2AFC  experiment  with  a fixed  value  of  K,  where 

K * 10  >>  2 E /N  , a varies  inversely  with  SNR  * 
u ° MF 


The  relative  performance  of  the  matched  filter  is  thus  most 
impressive  for  small  signal-to-noise  ratio,  as  one  would  intui- 
tively expect. 


2 . 3 Parameter  Estimation  from  Spectrograms 

2.  3.  1 Brief  Summary  of  Section  2.  3 

The  concept  of  a locally  optimum  estimator  is  introduced 
and  applied  to  spectrogram  analysis.  Variance  bounds  for  maxi- 
mum likelihood,  locally  optimum  spectrogram  estimates  of  range 
(or  time  of  occurrence),  and  Doppler  shift  (or  frequency)  are 
derived.  The  f-.  dieted  standard  deviation  of  a spectrogram  fre- 
quency estimate  is  compared  to  psychophysical  data.  The  compar- 
ison indicates  that  spectrogram  correlation  may  be  a viable  model 
for  mammalian  audition. 

2.3.2  Introduction 

Neurophysiological  evidence  indicates  that  spectrogram- like 
signal  representations  may  be  formed  by  the  peripheral  auditory 
system  (Evans,  1975  and  1977;  Pfeiffer  and  Kim,  1975).  The  major 
fault  with  a spectrogram  model  is  that  at  low  frequencies  (below 
6 kHz)  extra  timing  information,  indicating  peak  or  zero  crossing 
locations  of  sine-wave  stimuli,  is  encoded  by  peripheral  neurons 
(Anderson,  et  al.  , 1971).  Sine  wave  peaks  or  zero  crossings  would 
not  be  apparent  if  ideal  envelope  detectors  were  used  at  the  outputs 
of  a bank  of  bandpass  filters,  as  in  a spectrogram  model.  Siebert 
(1970),  however,  has  found  that  auditory  frequency  discrimination 
can  best  be  explained  if  the  peak  or  zero  crossing  information  is 
disregarded  by  the  central  nervous  system.  Siebert's  result  there- 
fore favors  a conventional  spectrogram  representation  in  the  central 
nervous  system. 

In  this  chapter,  we  begin  with  the  supposition  that  a spectro- 
gram is  available  to  an  ideal  observer,  and  we  derive  the  observer's 
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performance  in  estimating  signal  parameters  from  the  spectro- 
gram under  low  signal-to-noise  ratio  conditions.  Siebert  (1970) 
has  used  a similar  approach  to  obtain  estimator  performance  from 
a more  detailed  physiological  model  that  includes  the  shape  of  neural 
tuning  curves  and  a representation  of  neural  firing  patterns  as 
rate  - modulated  Poisson  processes.  Here,  we  begin  with  a conven- 
tional spectrogram  and  "plug  in"  behavioral  data  rather  than  neuro- 
physiological data.  The  resulting  estimator  performance  will  be 
seen  to  support  the  concept  of  a spectrogram  representation  in  the 
central  nervous  system. 


i 


It  will  be  assumed  that 


E »H 

• ■*  mi 


i.  e.  , that  data  and  hypothetical  spectrograms  are  normalized  to 
have  unit  volume.  From  Property  t>  in  Appendix  A of  Section  2.2, 
we  see  that  the  volume  normalization  condition  (2)  implies  that 
signal  and  filter  functions  are  energy  normalized.  From  (1)  and 
(2),  In  p ' z | s^  , Hj  | is  maximized  when 


H 

z s 
mn  mn 


is  maximized,  i.e.,  when  the  correlation  of  data  spectrogram  and 
hypothesized  spectrogram  is  maximized. 

For  small  SNR,  a maximum  likelihood  (ML)  estimate  of 
any  signal  parameter  p is  obtained  by  computing 


z s (P.J 
mn  mn  H 


for  all  p . The  ML  estimate  p is  the  value  of  p that  maximizes 
H H 

the  above  summation.  The  sum  in  (31  is  a spectrogram  correlation 
process,  i.e.,  it  is  a correlation  of  the  data  spectrogram  with  a 
sequence  of  hypothesized  noise-free  signal  spectrograms. 


2.3.4  A Variance  Bound  for  the  Locally  Optimum 
ML  Estimate 


For  a large  number  of  observations,  the  variance  of  a ML 
estimate  approaches  the  Cramer-Rao  (CR)  lower  bound.  An  ML 


estimate  for  low  SNR  data  is  implemented  by  spectrogram  corre- 
lation.  The  asymptotic  accuracy  of  parameter  estimation  by  means 
of  a spectrogram  correlator  is  therefore  given  by  a CR  bound  that 
is  computed  for  low  SNR  data. 

for  an  estimate  p of  a parameter  p,  the  CR  bound  is 
Var  (p  - p,  > j-elV/ap2)  fnp|z|s(p).  H,]  j'  ‘ . (4) 

For  low  SNR  *nP^z|£(p),  Hj]  is  given  by  (1),  and 
-E|o2/ap2)  /np[zjs(p),  Hj  ] | 

^ ~ Z-f  smn"(p»  s (P)/o4  (5) 

mTn  mn 

where  s 1 (p)  = ( d* / dpZ ) s (p)  . 
mn  mn 

The  dependence  of  the  bound  in  (5)  upon  signal  and  filter 
parameters  can  be  determined  by  specifying  s (p)  in  terms  of 

t_  • n in 

the  signal  u(t)  and  the  filter  v(t)  that  are  used  to  construct  the 

spectrogram. 

The  frequency  window  (bank-of-filters)  spectrogram  can  be 
written  ,n  terms  of  frequency-domain  functions  or  time-domain 
functions.  The  frequency  domain  form  is  (Ackroyd,  1971) 


s (p)  = 
mn 


= / U(f,  p)  V(f  - f ) exp(j2  irf  t ) df  ‘ 

I J n m 


where  unlabeled  integration  limits  correspond  to  (-<*,  *•)  . in  (6), 
U(f,.p)  is  the  Fourier  transform  of  the  input  signal  (which  depends 
upon  the  parameter  p),  and  V(f)  is  the  basic  filter  function  that  is 
used  to  construct  the  spectrogram.  The  time-domain  version  of 
the  spectrogram  is 


102 


where  the  primes  denote  differentiation  with  respect  to  p.  From 

(7), 


smn"(p)  = 2 
mn 


-j2  7T f t 

v (t  ' t)  e dt 

m 


+ 2 Re 


Ju'(t.p) 

ih 

/jZnf  t \ 
u*(t,  p)  v*(tm  - t)  e n dt  > 


-j2uf  t 

p)  v(t  -t)  e n dt 

m 


(9) 


Equations  (b)  — (9)  can  be  substituted  into  (5)  in  order  to  determine 
the  effect  of  signal  and  filter  functions  upon  variance  estimates. 

In  the  following  sections,  the  above  analysis  will  be  applied  to  range 
and  Doppler  estimates,  and  some  of  the  results  will  be  c npared 
with  behavioral  data. 

2.  3.  5 Delay  (Range)  Estimation 

For  delay  estimation,  U(f,  p)  in  (6)  becomes 

U(f,  r)  = U(f)  exp  (-jZnir)  . (10) 
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(10) 


Substituting  (JO) 


into  (8).  we  obtain 


df  r 


amn"(T)  2 I A,rfU(f)  V(f.f  , r) 

J n 

' 2 R*  If <Wu(f|V(f  r> 

/'  J n 

"•hiVim,,'’'""™'"  | 

n ' df  I . 

F°r  * Si«nal  bandwidth  B tha.  • 

bandwidth  B , we  can  a U *"  greater  ‘ban  the  filte- 

v’  we  can  assume  that 


df 


(11) 


U(f’  * ”V  over  the  bandwidth  of  V(f  _ j j 


Us>ng  this  assumption 


(12) 


a"d  cbanging  variables  i 


ln  01),  we  obtain. 


— s " 


mn  ^ 2(2jt)  ju(f 


h> 


n,|2fRe  If 


2 j2?rf(t  - t") 
f V(f)  e ~ 


m 


df 


•j2»rf(t  .T) 


m 


df 


fv,r> 


j2jrf(t  . T) 


m 


df  r 


(13) 


From  02)  and  (13). 


we  have 


>mn",T)  5 2 0»)^  iu(f  )(2  /f2 


y / v(f>/  df 


04) 
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» <t)  - |U(f 

mn  i 


„>'2  ' / 


j2*(f  + fXt  -r)  - 

V(f)  e df  I 2 


5 IU(fn)|2  | /|V(f)|  dfl 2 . 


F rom  (14)  and  (15) 


- S 8n  "(T)  smn(r) 

iw  n.n  mn 


< -2M(27T)2|u(fn)|4y*f2|v(f)|2  df  j |v(f)  | df 


where  M is  the  number  of  time  samples  such  that  s (t)  is  not 

mn 

identically  zero. 

We  shall  assume  that  the  signal  u(t)  is  a broadband  pulse 

with  duration  T which  is  much  less  than  the  duration  T of  the 
u v 

filter  impulse  response.  M is  then  the  number  of  time  samples  in 
the  filter  impulse  response  v(t).  Recall  from  Section  2.2  that 
the  spectrogram  is  sampled  at  a rate  of  4Bv  samples/sec  in  the 
time  direction.  It  follows  that 


M = 4B  T * 4 
v v 


and,  from  (14)  and  (15), 


N M 


n=l  m = l 


" 53  8(Zn)Z  ,U(fn)|4  J ^ ,V(f)|  df  1 df 


The  normalized  second  moment 


f2  |V(f)|  df  / / IV  (1)1  df  = Bv2 


is  a measure  of  the  mean  square  filter  bandwidth.  If  |U(f  )|  is 

n 

approximately  constant  over  N frequency  samples  and  is  approxi- 
mately zero  for  all  other  frequency  samples,  then 


,U(fn)|4  [ J |V(f)l  df]  7°4  H 


= SNR 


is  a measure  of  squared  signal-to-noise  ratio,  and  (16)  becomes 
N M 

- V s "(t)  s (t)/o  < 8(2tt)2NB  2 SNR2  , (19) 

/ . mn  mn  v 

n = 1 m=  1 

where  N is  the  number  of  non-zero  frequency  samples  of  the  signal 

spectrogram,  which  is  sampled  at  a rate  of  4 T samples/Hz  in  the 

v 

frequency  direction  (from  Section  2.2).  For  a signal  bandwidth  B , 

u 

we  have 


N = 4 T B ~ 4 B /B  samples  . 
v u u v 


Substituting  (20)  into  (19)  and  using  (4)  and  (5), 


Var  (t  - t)  > 32(2jt)2B  B SNR2 

V u 


For  a broadband  pulse  with  T <<  T and  B >>  B , the 
r u v u v 

standard  deviation  in  the  time  of  arrival  estimate  is 

I .1  -l 

At  > (c  B 2)  B 2 SNR  (22) 

O V u 

where  c is  a constant.  This  result  is  significantly  different  from 
o 

the  estimation  of  range  with  a matched  filter,  which  has  standard 
deviation  (Cook  and  Bernfeld,  1967) 


2 Bu"‘  SNR"! 
AtMF  5 Bu"‘  SNR'‘ 


for  large  SNR 


for  small  SNR  . 


It  would  seem  that  (22)  and  (23)  provide  a basis  for  discriminating 
between  a matched  filter  model  and  a spectrogram  correlation  model 
in  animal  echolocation.  Some  relevant  behavioral  tests  will  be  sug- 
gested further  on. 


2.3.6  Frequency  (Doppler  Shift)  Estimation 

For  Doppler  shift  estimation,  u(t,p)  in  (7)  becomes 

u(t,  fd)  = u(t)  exp  [j2ff(fu-  fd)  tj 


where  f^  is  the  frequency  of  the  transmitted  signal.  From  (9), 

r - jz 7r(f  +f  -f  )t 

s = 2 I /2fftu(t)  v(t  -t)  e n U dt | 

mn  a I m 


- 2 Re 


J (2nt)  u 


- j2  7T(f  +f  -f  )t 
n d u 

(t)v(t  -t)  e dt 

m 


/j  2 7T  (f  +f  -f  )t  ) 

u*(t)  v*(t  -t)  e n d U dt  [ . (25) 

m 


For  a signal  envelope  u(t)  that  is  relatively  smooth  and  is  much 


longer  than  T , we  can  assume  that 


u(t)  « u(t  ) over  the  duration  of  v(t  -t)  . 


Using  this  assumption  in  (25)  and  changing  variables,  we  obtai 


-SmnM<fd)  = W|u(t  )|Z  Re  tZy(t)JMt 


v*(t)  e'j2,rft  dt 


- I tv(t)ejWt  dt|2 


where 


f h f + f - f = f - (f  - f ) . 
n d u n u d 


From  (26)  and  (27), 


i"<fd)  5 2(2tt)2  l^(tm)|2  yt2  |v(t)|  dt  J | 


v(t)|dt  (29) 


8mn(fd)  * ' / v<‘>  dt  | 2 


< |u(t  )| 

m 


I j v(t)  ei2’ft  < 

[ /,v,t„  dt  . 


Therefore, 


-S  5 lU<*  ^ 

mn  d mn  d m 


0 

Jt2  t v (t ) | dt  ^ j | v(t) | dtj  /o4 


where 


t |v(t)|  dt  / 


J |v(t)|  dt  2 


is  a measure  of  mean  square  duration  of  v(t).  Assuming  that 
|u(t^)|  is  approximately  constant  over  M time  samples  and  is 
approximately  zero  elsewhere. 


|u(tm)|  4 | J |v(t)|  dtj  /o4  = 


2 SNR 


is  a measure  of  squared  signal-to-noise  ratio.  From  (31)-  (33), 
we  have 


s " (f_)  s (f)  /a4  S 2(2  n)2  M T 2 SNR2  (34) 

mn  n rr\n  H v 


- V'  s " (f .)  s (f 
mn  d mn  d 

m=  1 


where  M is  the  number  of  non-zero  responses  of  the  n filter, 
w'here  for  all  n.  The  response  of  each  filter  is  approxi- 

mately Tu  seconds  long,  and  the  response  is  sampled  at  a rate  of 

4B  samples/sec.  It  follows  that 
v 

M « 4 B T '*  4T  /T  samples  (35) 

v u u v 
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and 


N 

M 

-E 

E 

n=  1 

m = 1 

s " (f.)  s (fj/o 
mn  d mn  d 


< N(8)  (2ir)2  T T SNR2  . 
v u 


(36) 


In  (36),  N is  the  number  of  non-zero  samples  in  the  frequency 

direction.  For  a very  narrowband  signal  with  center  frequency 

f - f , Hz,  the  only  filters  that  have  a non-zero  response  have 
u d 

center  frequencies  f such  that  (f  - f ,)  - B /2  < f < If  - f .) 

n udv  n ud 

+ B /2.  For  a non-zero  response,  f covers  a band  that  is  B Hz 
v n j v 

wide.  Since  f is  sampled  every  (4T  ) Hz  « B /4  Hz,  the 
n v v 

number  of  non- zero  frequency  samples  is 


N s 4 B * samples  / Hz  x B Hz  = 4 
v v 


(37) 


and 


Var  t‘d  - y 


I 32  (2 t)2  T T SNR2 
L v u 


(38) 


For  a narrowband  pulse  with  T » T and  B « B , 

u v u v 

the  standard  deviation  of  a Doppler  or  signal  frequency  estimate 
is 


.1  .1  . 

Af  > (c  T 2)  T 2 SNR 
o V u 


(39) 


If  n harmonics  are  used  in  addition  to  a fundamental  component, 
H 

then  N in  (36)  becomes  N(l  + n ),  and 

H 
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If  a frequency  estimate  is  obtained  with  a bank  of  matched 


filters,  then  (Cook  and  Bernfeld,  1967) 

Af  ?Tul  SNR'*  for  large  SNR 

Af  5 T ~ 1 SNR1  for  small  SNR  . (41) 

The  standard  deviation  in  (39)  can  be  compared  with  behavioral  data. 

Such  a comparison  is  described  below.  Comparison  of  behavioral 
data  with  (39)  and  (41)  should  indicate  whether  spectrogram  corre- 
lation or  matched  filtering  is  closest  to  reality. 

2.  3.  7 Recommended  Experiments  and  Comparison  of 
Results  with  Existing  Data 

Error  bounds  have  been  obtained  for  estimates  that  use 
matched  filters  and  for  estimates  that  are  derived  from  spectro- 
grams. These  bounds  can  be  compared  with  behavioral  data  in  order 
to  determine  whether  spectrogram  correlation  or  matched  filter 
models  are  more  descriptive  of  auditory  signal  processing  in 

i j 

mammals. 

ii 

It  has  been  shown  that  the  standard  deviation  of  a delay 
measurement  using  spectrogram  correlation  is  asymptotically  pro- 
portional to  (SNR)  * where  is  signal  bandwidth  and  SNR 

is  a measure  of  signal-to-noise  ratio.  Delay  measurement  with  a 

matched  filter,  on  the  other  hand,  has  a standard  deviation  that 
“ 1 "1 

varies  as  (SNR)  By  under  similar  conditions. 

An  animal  or  human  can  be  trained  to  indicate  which  of  two 
pulse-pairs  has  the  shorter  time  between  pulses,  or  to  distinguish 
between  a single  pulse  and  a pulse  pair  until  the  interval  between 
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I I 
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pulses  in  the  pair  becomes  too  small.  The  bandwidth  of  the  pulses 

can  then  be  varied  for  a fixed  SNR.  The  accuracy  of  an  interpulse 

delay  measurement  or  the  ability  to  resolve  two  closely  spaced 

pulses  will  vary  as  B * if  a matched  filter  model  is  correct  and 
-l  u 

as  if  a spectrogram  correlator  model  is  more  descriptive  of 

auditory  processing. 


Experimental  data  for  frequency  discrimination  have  already 
been  published  by  Licklider  (1951),  Chih-an  and  Chistovich  (1961), 
and  Oetinger  (1959).  Siebert  ( 1970)  has  fit  this  data  with  the 
equation 


^data  C1 


(42) 


where  is  a constant. 

The  dependence  of  Af^^a  upon  T^  ^ is  consistent  with  the 

bound  in  (39),  which  also  varies  with  Ty  K In  fact,  (42)  and  (39) 
will  be  identical  if 

SNR"1  Tv_i  oc  f . (43) 

The  duration  T^  of  a filter  impulse  response  can  be  experi- 
mentally determined  by  exciting  the  filter  with  a time-gated  sine 
wave.  As  the  duration  of  the  sine  wave  becomes  longer,  the  filter 
output  power  increases,  until  the  duration  of  the  sine  wave  equals 
Ty.  For  longer  sine  wave  durations,  the  filter  output  power  stays 
constant. 


112 


Threshold  as  a function  of  sine  wave  duration  has  been  mea 


Isured  for  human  subjects  (Plomp  and  Bouman,  1959).  Threshold 

* 

decreases  with  increased  stimulus  duration  up  to  a duration  t. 

Figure  1 shows  r as  a function  of  f,  as  measured  by  Plomp 
and  Bouman  (1959).  The  solid  line  in  Figure  1 connects  the  mean 
values  of  r for  two  different  observers.  This  line  is  replotted  in 
Figure  2 on  coordinates  of  t 2 versus  frequency.  From  Figure  2, 
we  see  that  the  data  can  be  approximated  by  a straight  line,  i.e.. 


1 

r 4 oc  f (44) 


between  0.  5 kHz  and  8 kHz.  Assuming  that  t oc  T and  that  SNR 

v 

is  constant,  an  auditory  spectrogram  correlator  should  have 
frequency  uncertainty 


.1 

Af  (constant)  f T^  4 (45) 


which  is  the  same  as  Af  , . in  (42). 

data 

i 

Even  if  T 4 were  not  proportional  to  f,  the  fact  that  Af 

v .1  . ! 

is  proportional  to  T 4 rather  than  to  T would  seem  to  indi- 

u u 

cate  that  a spectrogram  correlator  model  is  better  than  a matched 
filter  model  for  auditory  frequency  estimation. 


2.3.8  Conclusion 


For  low  signal-to-noise  ratio,  a maximum  likelihood 
parameter  estimate  is  obtained  from  spectrogram  data  by  means 
of  a spectrogram  correlation  operation.  The  variance  of  this  esti- 
mate is  asymptotically  equal  to  the  Cramer-Rao  lower  bound. 

This  bound  has  been  calculated  for  an  estimate  of  delay  (using  a 


i 
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Figure  1.  Duration  parameter  x as  a function  of  frequency 
(from  Plomp  and  Bouman,  1959). 


Figure  2.  The  solid  line  in  Figure  1,  replotted  on 
coordinates  of  x-1/2  versus  frequency. 


broadband  pulse  that  is  narrow  in  time),  and  for  an  estimate  of 
frequency  or  Doppler  shift  (using  a long -duration,  narrowband 
signal  ). 

The  variances  of  the  spectrogram  correlator  delay  and 
frequency  estimates  are  different  from  the  variances  that  are 
associated  with  a matched  filter  (for  delay)  or  a bank  of  matched 
filters  (for  Doppler).  These  differences  can  be  used  to  construct 
behavioral  tests  to  distinguish  between  matched  filter  and  spectro- 
gram correlator  models  of  animal  echolocation. 

Frequency  estimation  data  has  already  been  obtained  from 
human  subjects.  This  data  indicates  that  the  standard  deviation  of 
a frequency  estimate  is  inversely  proportional  to  the  square  root 
of  tone  duration.  The  spectrogram  correlator  frequency  estimate 

I 

is  also  such  that  Af  oc  T i , where  T is  signal  duration. 

u u 

The  experimentally  observed  standard  deviation  is  propor- 
tional to  the  frequency  of  a sinusoidal  signal,  as  well  as  to  T 

u 

The  same  frequency  dependence  occurs  in  the  spectrogram  corre- 
lator estimate  provided  that  T , the  impulse  response  duration  of 
a filter  that  is  used  to  construct  the  spectrogram,  is  such  that 

T 2 oc  f.  An  estimate  of  T , obtained  from  auditory  threshold 
v v 

as  a function  of  tone  duration,  indicates  that  T 1 may  indeed  be 

v 

proportional  to  frequency. 

The  standard  deviation  of  the  spectrogram  correlator  fre- 
quency estimate  is  therefore  very  similar  to  behavioral  data.  This 
similarity  suggests  that  spectrogram  correlation  may  be  utilized 
in  mammalian  audition.  An  additional  test  of  this  hypothesis  is  to 
measure  the  just  noticeable  delay  difference  between  two  broadband 
pulses  as  a function  of  pulse  bandwidth  and  signal-to-noise  ratio, 
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and  to  compare  the  data  with  At  in  (22),  the  calculated  standard 
deviation  of  the  corresponding  spectrogram  correlator  estimate. 
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2.4  Spectrogram  Correlation  as  an  Ideal  Detection  Process 

for  Signals  that  Have  Been  Passed  Through  a Time- 

Varying,  Random  Channel 

2.4.  1 Brief  Summary  of  Section  2.  4 

When  a random  or  deterministic  signal  is  passed  through 
a time- varying,  random  channel,  the  expected  spectrogram  of  the 
channel  output  is  the  two-dimensional  convolution  of  the  channel 
scattering  function  and  the  expected  spectrogram  of  the  input 
signal.  In  Gaussian  noise,  a locally  optimum  detector  correlates 
samples  of  the  data  spectrogram  with  corresponding  samples  of  the 
noise-free  spectrogram  of  the  channel  output.  In  non-Gaussian 
noise,  a recent  result  of  Poor  and  Thomas  indicates  that  a modified 
data  spectrogram  correlator  should  be  used,  where  the  usual  square- 
law  envelope  detector  is  replaced  by  a detector  with  a different 
power  law.  These  results  are  compared  to  a Karhounen- Loeve 
(K-L)  detector  implementation.  The  comparison  indicates  that  under 
low  SNR  conditions,  the  spectrogram  correlator  is  equivalent  to 
the  K-L  detector,  except  that  the  spectrogram  generally  has  more 
degrees  of  freedom.  The  spectrogram  correlator  may  be  prefer- 
able because: 

(1)  It  does  not  require  the  solution  of  an  eigenfunction 
equation; 

(2)  it  can  be  configured  as  an  iterative  estimation  device 
that  adjusts  to  changes  in  the  channel  scattering 
function; 

(3)  it  can  be  implemented  with  standard  FFT  techniques; 


(4)  the  number  of  degrees  of  freedom  can  be  reduced  by 
transmitting  a periodic  signal;  and 

(5)  more  degrees  of  freedom  allow  for  the  implementation 
of  a more  sensitive  mismatched  correlator  for  maxi- 
mization of  signal-to-clutter  ratio. 

2.4.2  Introduction 

In  Section  2.  2,  it  was  shown  that,  if  we  are  given  the  spectro- 
gram of  an  observed  time  function,  and  if  we  know  the  noise  free 
spectrogram  that  would  be  observed  in  the  presence  of  a signal,  then 
a locally  optimum  detector  can  be  implemented  by  correlating  samples 
of  the  data  spectrogram  with  corresponding  samples  of  the  noise-free 
signal  spectrogram.  Since  we  were  concerned  with  models  of  hear- 
ing, it  was  reasonable  to  assume  that  the  data  was  indeed  presented 
to  the  detector  in  the  form  of  a spectrogram. 

In  this  section,  we  are  concerned  with  the  class  of  detection 
problems  for  which  it  is  optimum  to  describe  the  data  in  terms  of  its 
spectrogram.  In  other  words,  we  no  longer  assume  that  we  are 
given  the  data  in  the  form  of  a spectrogram.  In  this  section,  we  are 
given  the  data  waveform  itself  and  we  are  free  to  build  a matched 
filter,  a spectrogram  correlator,  or  any  other  detection  device. 

If  the  noise  is  additive  and  Gaussian,  and  if  the  noise-free  signal  is 
known  except  for  time  of  arrival,  the  optimum  detector  is  a whiten- 
and-match  filter  followed  by  a threshold,  and  spectrogram  correla- 
tion is  irrelevant.  On  the  other  hand,  there  may  exist  an  important 


The  main  result  of  this  section  is  that  there  i£  indeed  an 
important  detection  problem  with  a solution  that  can  be  structured 
in  terms  of  a spectrogram  correlation  operation.  This  problem  is 
to  detect  a deterministic  or  random  signal  that  has  been  passed 
through  a randomly  time-varying  channel,  under  low  energy  coher- 
ence (LEC)  conditions.  Low  energy  coherence  means  that  the  energy 
of  the  channel  output  is  spread  out  over  time  and/or  frequency,  with 
relatively  small  SNR  over  any  one  time-frequency  interval.  This 
signal  description  is  typical  of  sonar  echo  data  from  range  and/or 
Doppler  distributed  targets,  from  clutter  or  reverberation,  and 
from  propagation  through  a time-varying,  multipath  channel. 

The  analysis  will  begin  by  obtaining  an  intuitively  pleasing 
result,  viz.,  the  expected  spectrogram  of  the  channel  output  is  the 
convolution,  in  time  and  frequency,  of  the  channel  scattering  func- 
tion1 with  the  expected  spectrogram  of  the  channel  input.  The  input 
spectrogram  is  thus  "smeared"  by  the  scattering  function,  which 
describes  the  expected  delay  and  Doppler  variation  that  is  introduced 
by  the  channel.  If  a channel  can  be  described  in  terms  of  its  scatter- 
ing function,  it  follows  that  the  expected  noise -free  spectrogram  of 
the  channel  output  can  be  determined  from  knowledge  of  the  scatter- 
ing function  and  the  expected  spectrogram  of  the  input  signal.  If  the 
scattering  function  is  itself  unknown  or  slowly  time  varying,  then  the 
scattering  function  at  any  given  time  can  be  estimated  from  a sequence 
of  previous  spectrogram  measurements. 

A spectrogram  representation  therefore  appears  to  be  a con- 
venient and  natural  way  to  describe  the  response  of  a time-varying , 
random  channel.  We  have  already  shown  in  Section  2.2  that,  under 
LEC  conditions,  the  locally  optimum  detector  for  a signal  with  known 
spectrogram  in  white,  Gaussian  noise  is  a spectrogram  correlator. 
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Poor  and  Thomas  have  recently  generalized  this  result  to  include 
non-Gaussian  noise.  If  the  noise  that  accompanies  the  data  is  not 
Gaussian,  a different  power  law  is  used  for  envelope  detection, 
rather  than  the  usual  square-law  device,  but  the  spectrogram 
correlator  configuration  remains  valid. 

Another  way  to  describe  the  channel  output  is  in  terms  of 
a Karhounen- Loeve  (K-L)  expansion,  i.  e.  , in  terms  of  the  eigen- 
functions of  the  signal  covariance  matrix.  The  locally  optimum 
detector  for  the  K-L  representation  will  be  compared  to  the  spectro- 
gram correlator,  and  some  advantages  of  the  spectrogram  represen- 
tation will  become  apparent. 

Signal  and  filter  design  will  also  be  considered  from  the 
viewpoint  of  spectrogram  correlation.  Price's  criterion  for  LEC 
signal  design  can  easily  be  obtained  from  the  spectrogram  formula- 
tion. The  use  of  a spectrogram  allows  the  receiver  to  manipulate 
the  filter  or  time  window  that  is  used  to  construct  the  spectrogram, 
as  an  alternative  to  adjusting  the  transmitted  signal. 

Finally,  results  will  be  described  in  terms  of  the  usual  bank 
of  matched  filters  that  is  used  to  detect  a narrowband  signal  with 
unknown  Doppler  shift.  Such  a filter  bank  can  be  used  to  construct 
a special  kind  of  spectrogram,  and  spectrogram  correlation  can  be 
applied  to  the  envelope  detected  filter  responses.  A receiver  that  is 
designed  for  detection  (in  white,  Gaussian  noise)  of  a signal  that  is 
exactly  known  except  for  time  of  arrival  and  frequency  can  easily 
be  reconfigured  to  detect  the  random  output  of  a time-varying  chan- 
nel in  non-Gaussian  noise. 


U 


i 
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2.4.3  Scattering  Functions  and  Spectrograms 


Bello*  has  defined  a symmetrized  expected  ambiguity 
function  0 (t,p)  as  follows: 


uu 


J E[u*(t-T/2)  u(t+T/2)]  exp(-j2ff£t)dt. 


(1) 


(2) 


From  Ackroyd  the  frequency-window  spectrogram  of  a signal 
u(t)  is 

suv(t1.fi)=  1 / U(f)  V(f-fj)  expljZffftj)  df  P 

-0O 

= J J u(t)  v(tj-t)  exp(-j2fffjt)  dt  |2. 

-OO 

It  is  shown  in  the  Appendix  (Section  2.  4.  9)  that 

/*/ °E[Suv(t.  f)]e‘j2,r(,Pt"Tf)dtdf  = 0uu(t,  4)  *vv(-r,  o).  (3) 

-oO  - uO 

In  other  words,  the  Fourier  transform  of  a spectrogram  is  the 
product  of  the  signal  ambiguity  function  and  a filter  ambiguity 
function,  or 


and 


Fl{E[SIv,t-  f)!;T>  *VV'-T-  *> 


(4) 


(5) 


where  F^{ ® } is  a two  dimensional  Fourier  transform  as  in  (3)  and 

0 (t,  <i>)  is  the  ambiguity  function  of  a channel  output  signal,  z(t), 

zz 

in  the  absence  of  noise. 

A fundamental  input-output  relation  for  time  varying 
random  channels  is*’^ 


tl)  Jr.4>)  = R (t,  0)0 
zz  ri  uu 


(6) 
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R 


where  R^(t,  0)  is  the  time  frequency  channel  autocor relation  function 


Rh(t.0)  * E{H*(t,f)H(t+T 


and  where  H(t,  f)  is  the  time-varying  transfer  function  of  the  channel. 


Bello  has  defined  the 


'ring  function  as 


j2fT(t0-fT) 


S(t.f)  = f I R h(t*  g)e 


= F1*1jRH(T,0);t.f}. 


From  (6),  we  have 


8 RH(T.<*  )*uu<t'  (9) 

Taking  the  inverse  Fourier  transform  of  both  sides  of  (9)  we  have 

e{s  (t.f)J  = S (t,  f)  * E \ S (t,f)J  (10) 

t rv  l t uv  • 

where  ( * ) denotes  a two-dimensional  convolution  operation  in  time 
and  frequency. 

Equation  (10)  says  that  the  expected  output  spectrogram  is  the 
two-dimensional  convolution  of  the  medium  scattering  function  and 


the  expected  input  spectrogram.  The  scattering  function  therefore 
determines  the  extent  to  which  a given  part  of  the  signal  in  time 
and  frequency  coordinates  is  smeared  by  the  channel.  This 
channel -induced  smear  can  be  observed  in  practice  by  computing 
the  frequency  window  spectrogram  of  the  channel  response.  Alter- 

4 

natively,  one  can  compute  the  time  window  spectrogram 

E{SL(Wf=  e{  I f *(t)w(t-t1)e*p(-j2tf1t)dt|2}.  (11) 


* «qn*i 


1 


By  comparing  (11)  and  (2),  we  see  that 


if 


8.„<VV*  s.v«'rV 


w(t)  = v(-t). 


(12) 


(13) 


The  time  window  and  frequency  window  spectrograms  are  identical 
if  the  time  window  is  appropriately  defined. 

2.4.4  Detection  Via  Spectrogram  Correlation 

Under  hypothesis  Hj,  we  have  a random  signal  with  known 
expected  spectrogram,  added  to  white,  Gaussian  noise.  Under  HQ, 
we  have  noise  alone.  In  Section  2.2,  it  was  shown  that  statistically 
uncorrelated  samples  of  the  spectrogram  are  spaced  (4B  )"*  sec- 
onds apart  in  the  time  direction  and  (4Tv)_1  Hz  apart  in  the  fre- 
quency direction,  where  the  filter  V(f)  has  bandwidth  B and 

v 

impulse  response  duration  T . For  Gaussian  input  noise,  the 
sampled  spectrogram 

Zij  = • r(t)  = z(t)  + noise  (14) 

has  probability  density  function 

P<*ij  I ®ij.  Hl)  = ^expt-Uij  + ^.j)/**2 ] /a2)  (15) 

when  Hj  is  true,  where  <T  is  the  average  noise  power  and 

*ij  = E{^zv^ti’V^’  Z ^ = noise“*ree  signal-  (16) 

If  the  signal  is  multiplied  by  a parameter  A,  then  s . is  multiplied 

2 2 U 

by  A . Replacing  s_  by  A s.^  in  (15),  we  have 

P(*ijl  •ij  Hl)  = o'2exp[-(*i./a2+0si.)]l()(2^i^i~/o)  (17) 
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where 


o = a2/ a 


is  a signal-to-noise  ratio  parameter. 


When  Hq  is  true,  we  have 

p(z..|H  ) = a “exp  (-z./a“). 
ij  0 r ij 


The  likelihood  ratio  is 

N M 


L,.(z)  = II  II  exp(-ds..)I  (2d  *\/z..s../a) 

“ 11  O n n 


i=i  i=i 


The  locally  optimum  test*-’  * for  vs.  compares  dL()(z)/dd 
for  d = 0 with  a threshold.  Substituting  the  standard  series  repre- 
sentation for  the  modified  Bessel  function^  into  (20),  we  have 


dL(,(£) 


N M . * v , 2,k 

Ey*  ( . d V*  [d(e.  .)/  O ] I 

2-  ^ + -777  /f  U U 

d = 0 * 1 ' lJ  d [ k 0 (k!)2  d = o 


N M 

= 52  52  (-*..  + Z..S../O  ).  (21) 

1 1 .,-1  *J  ') 

The  first  term  in  the  sum  is  independent  of  the  data,  z...  The  data- 
dependent  operation  is  the  computation  of 
N M 

t (z)  = 52  52  z,  .8. . (22) 

8 “ i =1  j = 1 ij 

which  is  a spectrogram  correlation  operation. 

Poor  and  Thomas'-  have  generalized  the  above  result  for  an 
arbitrary  noise  p.  d.  f.  f(x).  Their  result  is  equivalent  to  the 
statistic 
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N M 

e u)  = EEs..  g(x. .)  (23) 

s_  ij  ij 

tH 

where  x. . is  the  output  of  the  j filter  at  time  i and 

g(x)  s f"(x)/f(x).  (24) 

In  non-Gauss ian  noise,  samples  of  a modified  data  spectrogram  with 
a power  law  determined  as  in  (24)  are  correlated  with  samples  of  an 
ordinary  signal  spectrogram  defined  by  (16). 

The  locally  optimum  detector  in  (22-24)  is  ideal  when  each 
sample  of  the  data  spectrogram  has  small  signal-to-noise  ratio. 

If  the  samples  s..  in  (22)  are  all  nonzero,  it  would  appear  that 
the  energy  of  the  channel  output  has  been  "smeared  out"  over  time 
and/or  frequency.  This  situation  is  called  a low  energy  coherence 
(LEC)  condition.  ^ 

The  locally  optimum  detector  is  not  ideal  for  larger  SNR,  but 
for  large  SNR  it  is  usually  not  critical  to  use  the  best  possible  detec- 
tion procedure. ^ 

2.  4.  5 K-L  Representations  Versus  Spectrograms 

g 

The  standard  detector  for  the  output  of  a time-varying, 

random  channel  in  the  presence  of  Gaussian  noise  is  obtained  by 

using  a Karkounen-Lo^ve  (K-L)  signal  representation.  The  K-L 

representation  uses  eigenfunctions  of  the  signal  covariance  function 

K (x,y)  as  basis  functions  for  decomposition  of  the  data.  In  terms 
z 

of  signal  and  channel  representations, 

K (x,  y)  = K (t  +t/2,  t - t/2)  for  t = ^-7^  ; r = x-y. 
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where 


OO 

K (t+T/2.  t-T/2)  = f $ (t,  0)exp(j2fft<»d^ 

z J Z 7. 

-00 

= f Ru(T,d)^  (T,0)  exp  (j2irttf)db  . (25) 

J ri  uu 


The  eigenfunctions  O.(n),  i=l,  2,  . . . , N are  such  that 
f K (x,  y)  to.(y)dy  = X.  0.(x),  OSx  «T, 

J Z 1 11 


(26) 


and  the  detection  statistic  is 

<KL^  = i?i  (x.  + N /l)  K' 

' l o ' 

where  Nq/2  is  noise  power  spectral  density  and 
T 

lxi|2  = if  r(t)0.  (t)dt  I2. 


(27) 


In  (27),  r(t)  is  the  data  signal,  which  is  z(t)  + Gaussian  noise 
under  H^,  and  the  number  |x^l  is  the  envelope  detected  response 


of  a filter  with  impulse  response  <p^(-t). 


For  the  small  SNR,  LEC  case. 


X.  < < N /2  , i=l,  2,  ....  N, 

1 o 

and  if  the  threshold  is  multiplied  by  N /2,  we  have 

«“c<s>  - E x,|  * |2. 


KJL 


i = l 


l l 


In  (28), 


„T  T 


X.  = f f (0*  (x)  K (X,y)<fl  (y)dxdy 
1 -o  o 1 z 1 

T T 

= E^  O.  (x)  z*  (x)  dx  J <P.  (y)  z (y)  dy  ^ 


(28) 
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(29) 


= E 


(I'J2}  ‘ 


in  the  absence  of  noise. 


The  K-L  detector  for  an  LEC  condition  therefore  correlates  a 

2 

sequence  of  envelope  detected  filter  responses  | x.  | with  the  ex- 
pected values  of  these  responses  for  a noise-free  channel  output. 
The  eigenvalues  in  (28)  are  thus  equivalent  to  the  noise-free, 
expected  spectrogram  samples  s..  in  (22).  The  spectrogram 
analog  to  (27)  is 


z. . 


Srv<VV 


r*>  2 
= | J r(t)v  (ti~t)  exp  (-j2  fff.t)  dt  | 

-oo  ^ 


(30) 


Equation  (29)  is  identical  to  (26)  if  v(t. -t)  exp  (-j2  ff  f.t)  is  an  eigen- 
function o.(t)  of  K (x,t)  and  if  v(t.-t)  has  duration  equal  to  T,  the 
j z i 

integration  time  in  (26).  Consider,  for  example,  the  special  case 
where 

K (x,  y)  = K (x-y)  = K (t) 
z z z 

is  periodic  in  r with  period  T,  i.  e. , 

N 

K (t)  = J2  a exp  (-j2ffnT/T) 
n=0  n 


exp  ( -j2  ffnx/T)  exp  (j2irny/T). 


Defining 


O (t)  = 
n 


-1/2 


T exp  (-j2ffnt/T)  , 0<t  £T 

| 0 , otherwise 


we  have 


N 


K (x,y)  = T J]  a O (x)<0  *(y)  for  0 *x  «T 
z n = 0 n n n 

and  0 <y  < T. 
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Letting 


v(-t)  = 


, 0<t5T 

, otherwise 


we  have 


T-tx 

|T  r(t)  exp  (-j2»rf  t)  dt  | 2 

,;lv 

| r(t  + t.)exp(-j2rrf  t)dt 

JQ  i n 

| J r(t+t.)  <Mt)dt| 


where 


f s n/T. 
n 

If  r(t)  is  periodic  with  period  T,  then  E{z^}  is  independent  of  t.  and 


z..  = z.. 

J 

In  this  case 

N M 

I (z)  =2 

S “ 1 = 1 j = l ‘J 

M 

a Em. 

j=1  J J 

, LEC 

= fKL  (z).  (31) 

A spectrogram  correlator  is  thus  identical  to  the  usual  K-L  receiver 
under  LEC  conditions,  if  the  basis  functions  that  are  used  for  spectro- 
gram construction  happen  to  be  eigenfunctions  of  the  signal  covariance 

function  K (x,y).  This  situation  occurs  when  the  covariance  function  is 
z 

a periodic  function  of  (x-y).  The  periodicity  causes  the  noise-free 
reference  array  s to  be  independent  of  the  timing  parameter  i, 
and  the  two  dimensional  time-frequency  correlation  in  the 
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spectrogram  detector  becomes  a one-dimensional  time-independent 
correlation  as  in  the  K-L  detector. 

If  the  signal  u(t)  is  periodic  with  period  T,  then  K (r)  is  also 

z 

periodic  with  period  T.  This  statement  follows  from  the  fact  that 

\b  ( r ,p)  is  a "bed  of  nails"  that  are  separated  by  T seconds  in  the 
UU  -1  9 

T-direction  and  by  T Hz  in  the  ^-direction.  Therefore  ^ (r,p)  in 

7.  z 

(6)  has  the  Same  form.  The  Fourier  transform  of  \}>  (r,  6)  with 

z z 

respect  to  the  b-variable  in  (25)  is  then  a "bed  of  nails"  that  are 
separated  by  T seconds  in  both  coordinates. 

The  special  case  in  (31)  should  therefore  be  applicable  to  many 
practical  communication  systems.  For  a sonar  with  unchanging 
pulse  repetition  frequency  (PR.F),  the  gated  data  from  a particular 
range  interval  will  also  be  periodic,  if  the  scattering  function  of 
the  reflectors  within  that  interval  is  stationary.  If  a range- 
extended  target  covers  a sequence  of  such  gated  intervals,  and 
if  the  target  scattering  function  is  range-dependent,  then  one  must 
use  a sequence  of  K-L.  detectors,  one  for  each  range  bin.  For  a 
given,  steady  PR F=  T \ each  K-L  detector  is  a bank  of  band-pass 
filters,  and  the  sequence  of  K-L  detectors  for  different  gate  posi- 
tions on  a range-extended  target  is  a spectrogram  correlator. 

Implementations  of  a K-L,  LEC  detector  and  a spectrogram 
correlator  are  compared  in  Figs.  1 and  2.  Both  receivers  consist 
of  filter  banks  followed  by  envelope  detectors,  where  the  envelopes 
can  be  generalized  for  non-Gaussian  noise  as  in  (23)  - (24).  The 
most  obvious  difference  between  the  two  systems  is  that  the  K-L 
detector  uses  a single  weighting  parameter  for  each  envelope 
detected  filter  response,  while  the  spectrogram  detector  uses  a 
sequence  of  weights  that  can  be  synthesized  as  a transversal  filter. 


z(t)  + NOISE. 

IF  H1  IS  TRUE 
NOISE.  IF  H0 
IS  TRUE 


THRESHOLD 


The  K-L  detector  thus  appears  to  possess  the  simplest  implemen- 
tation. 


Another  advantage  of  the  K-L  detector  is  associated  with  the 
relatively  small  number  of  weights  in  Fig.  1 and  with  the  fact  that 
a K-L  expansion  is  equivalent  to  a principal  component  analysis. 

A principal  component  analysis  provides  the  best  (MMSE)  repre- 
sentation of  the  data  for  any  limited  number  of  basis  functions.10 
Let  the  first  filter  (T-t)  be  associated  with  the  largest  eigen- 
value Xj,  let  <^(T-t)  correspond  to  the  second  largest  eigenvalue, 
etc.  If  the  eigenvalues  happen  to  decrease  rapidly,  then  much  of 
the  expected  signal  energy  will  be  concentrated  at  the  outputs  of 
the  first  few  filters,  and  comparison  of  Figs.  1 and  2 indicates 
that  this  energy  will  also  be  concentrated  in  time.  The  K-L 
system  therefore  tends  to  concentrate  signal  energy  into  a rela- 
tively small  number  of  uncorrelated  samples.  Under  low  SNR, 

LEC  conditions,  this  concentration  of  energy  has  the  same  intui- 

o 

tive  appeal  as  a pulse  compression  process  for  detection  of  a known 
signal  with  low  instantaneous  power. 

A rigorous  approach  to  the  diversity  vs.  concentration  prob- 
lem has  been  formulated  by  Daly  , who  found  that  a noncoherent 
summation  as  in  (22)  and  (28)  gives  the  best  detection  performance 
if  each  term  in  the  sum  has  SNR~2.  The  LEC  condition  corresponds 
to  SNR«1  for  each  of  the  samples  in  Fig.  2.  The  best  detection 
performance  is  then  obtained  by  decreasing  the  number  of  uncor- 
related samples  and  by  concentrating  signal  energy,  as  in  Fig.  1. 

There  are  several  reasons,  however,  why  a spectrogram 
detector  may  be  preferable: 


< 
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1,  The  spectrogram  correlator  does  not  require  a new 
solution  of  the  eigenfunction  equation  (25)  every  time  the 
scattering  function  changes. 

2.  Gradual  changes  in  the  scattering  function  can  be 
estimated  and  "tracked"  with  the  spectrogram  configura- 


tion. For  example,  an  unbiased  estimate  of  s..,  the  cx- 

th  1J 

pected  squared  response  of  the  j filter  at  time  i in  the 


absence  of  noise,  is 
K 


Sij <n)  5 kTT  Jt  Vn-K)  • ffj2 

-■  kTT  i 1 Vn)  - 'j2'  + K sij,n-1)} 


(32) 


th 


where  s..(n)  is  the  estimate  of  s..  for  the  n observation  of 


lJ 


th 


the  signal,  z..(n)  is  the  observed  response  of  the  j filter  at 


th 


time  i for  the  n observation,  and  O.  is  the  average  noise 

J 


power  at  the  filter  output.  Equation  (32)  is  a simple  form 
of  recursive  spectrogram  estimate. 

3.  From  a practical  viewpoint,  the  spectrogram  processor 
is  more  attractive  than  the  K-L  system  because  the  receiver 
in  Fig.  2 can  be  implemented  with  a sequence  of  fast  Fourier 
transform  operations.  In  particular,  one  can  take  advantage 
of  (11)  - (13),  which  imply  that  a time  window  spectrogram 
can  be  used  as  the  basic  element  in  the  implementation  of  a 
spectrogram  correlator. 

4.  If  it  is  desirable  to  concentrate  signal  energy  by  limiting 
the  number  of  uncorrelated  samples  or  degrees  of  freedom, 
one  can  force  the  spectrogram  correlator  into  a K-L  config- 


uration by  transmitting  a pulse  train  with  a stable  pulse 

.-1 


repetition  frequency  T , i.  e.  , by  using  a periodic  signal, 
and  by  using  an  integration  time  equal  to  T. 


il 
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2.4.6  Signal  Design  for  Maximization  of  Detector  Signal- 
to-Noise  Ratio  and  Signal-to-Interference  Ratio 


The  expected  response  of  the  spectrogram  correlator  f (z) 

s — 


in  (22)  to  a noise-free  signal  is 
N M -,  0000 


IN  1V1  2 00  / 12 

is<£)=  X)  S*ijOc f J lE{Szv(t,f)f  I dtdf  (33) 

i=i  J=i  -00  -00 


The  expected  response  to  noise,  in  the  absence  of  signal,  is 


» N M 2 r r l 1 

X DSij  °ca  II  E Szv(t'f)}dt 

i=l  j=]  J -00-00 


00  00 


i (noise)  = O' 
s 


df 


(34) 


where  O is  the  expected  noise  power  at  the  output  of  each  filter. 


From  Eq.  (A8)  in  Section  2.  2, 


/7Ekv(t’f)}dtdf  * e,ev' 


(35) 


- OO  -OO 

For  a given  signal  energy  E and  filter  energy  E , SNR  is  maximized 


by  making  £ (s)  as  large  as  possible.  From  Parseval's  theorem  and 

S ” 


(5)  - (6) 


OO  OO 


SNRoc  f \ \R(r,<t>)ip  (t,$)0  (-  r,  <i>)  ] 2 drd<t> . (36) 

J J H uu  vv 


■OO  -oc 


A spectrogram  filter  bank  or  window  function  is  usually  designed  so 
that  v(t)  has  minimum  time -bandwidth  product.  The  ambiguity 
function  of  v(t)  is  then  generally  broad  and  smooth  relative  to 
(t,<£),  and 


uu 


2 r 00  r°°  2 2 

SNR  < max  U (r,  tf>)|  / / |r  (t,  0)|  | ([)  (r,</>)|  dr  dp 

T,<t>  VV  -4  U H uu  (37) 


where  the  inequality  is  very  nearly  an  equality.  Since' 


max  |(l>vv(r,0)  | = U„J0.0)|  = Er  =1, 


vv 
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(38) 


we  have 


OO  0O 

SNR* / 1 |rh(t,  ci)  I2  l<f>uu<T»0>  I2  drdO 


(39) 


- JC  - JO 


and  SNR  is  maximized  by  maximizing  the  right  band  side  of  (39). 

This  criterion  for  LEC  signal  design  was  first  introduced  by  Price.  ^ 
If  spectrogram  correlation  is  used  and  if  one  is  willing  to  manipu- 
late the  spectrogram  filter  or  window  function,  then  |\t>  (-t ,<p)  \ ^ 

in  (36)  can  be  adjusted  as  an  alternative  to  adjusting  the  transmitted 
signal. 

One  advantage  of  the  spectrogram  approach  is  that  we  can 
easily  visualize  the  effects  of  cross-talk  or  interference  in  binary 
communication  and  the  effects  of  clutter  and  reverberation  with  a 
specified  scattering  function. 

To  eliminate  cross-talk,  i.  e.  , to  maximize  signal  to  inter- 
ference ratio  for  a binary  communications  problem,  we  should  have 

villW2)>  = * o ho, 

i-lj-l  J J 

where  jsj^  are  samples  of  the  expected  noise-free  spectrogram 
when  signal  u^  (t)  is  transmitted  and  where  js.^j  are  samples  of 
the  noise-free  spectrogram  when  u^ft)  is  transmitted.  Again  using 
Parseval's  theorem,  we  have 


„00  „OC 


<s(!(1).  l(2))oc  / / E[  Szv  (t'  f)J  {E[Slv  (t'  f,J  f*dtdf 

<30  OC 

* / / lR  I2  *UU(T’*)  u*(T,^)drdp. 


2 2 


(41) 


To  minimize  cross-talk,  the  inner  product  of  the  weighted  ambiguity 

functions  R 0 Ip  and  R Ip  ip  should  be  minimized. 

H w u^Uj  H vv  u^u^ 
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Another  approach  to  the  cross-talk  problem  is  to  use  mis- 
matched filtering  in  the  spectrogram  domain.  Suppose  that  the 
output  spectrograms  for  i^(t)  and  u^(t)  are  partially  overlapping. 

A reasonable  strategy  is  then  to  deaccentuate  those  reference 

samples  sfl^  and  that  occur  in  the  region  of  expected  overlap, 

ij 

For  a range-distributed  sonar  target  with  transfer  function 
Hj(t,  f ) and  for  reverberation  or  clutter  with  transfer  function 
H^ft,  f),  the  interference  that  is  caused  by  clutter  echoes  can  be 
written  in  the  same  form  as  in  (40).  The  effect  of  clutter  is  then 
minimized  when 

t (S(1),  S(2))OC  f f R ( T,  <$)  R |V>  (-T.0)  0 (T.0)|2 

S J J n,  n«  W Uu 

•or  -oo  1 C 

is  minimized.  The  signal  design  problem  for  sonar  is  then  to  maxi 

mize  the  signal-to-clutter  ratio 

|2 


SCR  = 


// 


|R„(t.0)|  It  (-T.t)  t (T.t)l  drdo 
Hj  vv  uu 


// 

- oo  -oo 


R„  (T»0)  R„  It  (-T.t)t  (T.t)l  drdt 

H.  H?  w uu 

(42) 


by  an  appropriate  choice  of  signal  and  filter  functions.  Again,  we 
observe  that  the  spectrogram  formulation  allows  the  receiver  to 
increase  SCR  by  manipulation  of  the  spectrogram  window  function 
or  filter  transfer  function,  v(t).  The  reference  spectrogram 
samples  jsj!^  for  the  target  can  also  be  adjusted  to  deaccentuate 
the  areas  in  time-frequency  space  where  the  expected  target  and 
clutter  echoes  overlap. 

If  the  samples  |sj^|  of  the  expected  noise-free  target  spec- 
trogram are  to  be  adjusted  to  deaccentuate  the  clutter  response  of 
the  spectrogram  correlator,  then  it  may  be  advantageous  to  spread 


drdO 


out  the  expected  target  and  flutter  echoes  in  ttme  - frequency  space, 
so  that  the  best  weighting  parameter  ear  b«  used  at  each  point. 

This  observation  suggests  that  the  K-l.  d«  te» ’or  in  Figure  1 may 
not  possess  a large  enough  number  of  sample*  to  effectively  dis- 
criminate against  spurious  echoes,  especially  in  a i hanging  clutter 
environment. 

2.4.7  Detection  of  a Random,  Time- Varying,  Range- 
Extended  Target  with  a Hank  of  Matched  Filters 

A standard  narrow  band  sonar  receiver  uses  a bank  of 
matched  filters,  where  each  filter  is  matched  to  a different  Doppler- 
shifted  version  of  the  transmitted  signal.  This  system  is  designed 
for  the  detection  of  a signal  that  is  known  except  for  Doppler  shift 
and  delay,  in  white  Gaussian  noise.  The  corresponding  sonar  re- 
flector is  an  ideal  point  target  with  unknown,  constant  radial 
velocity. 

The  envelope  detected  matched  filter  responses  as  a function 
of  delay  and  Doppler  shift  describe  a magnitude-squared  ambiguity 
function.  The  response  of  the  filter  bank  can  also  be  written  in 
terms  of  a spectrogram.  Suppose  that  the  filter  for  a frequency 
window  spectrogram  has  impulse  response  v(t)  = u*(-t).  From  (2), 

Suv(t1,fj)  = | f u(t)u*(t-tj)  exp  (-jZrrfjt)  dt  |2 

• «0 

* huu'-VV'2  (43> 

where  X (r,b)  is  a non-symmetrized  ambiguity  function.  Using 
uu 

(10),  the  response  of  the  filter  bank  to  a range-extended,  random, 
time-varying  target  car.  be  written 

E{szv(t,f)|  = g*(t,  f)  ^ IXUu<“t.  f)  ! 2 (44) 
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where  5T(t,  f)  is  the  channel  scattering  function  and  v(t)  = u*(-t). 

For  a motionless  point  target,  S(t,  f)  = 6(t)  6ff)  and  E is  (t,  f)>  = 

? I iv  I 

lx  <-M>r. 

uu 

Since  (44)  describes  a special  kind  of  spectrogram,  the 
spectrogram  correlation  process  in  (22)  and  Fig.  2 can  be  applied. 
Locally  optimum  detection  with  a bank  of  matched  filters  for  a 


random  time-varying,  range-extended  target  is  then  implemented 
by  forming  a weighted  sum  of  sampled,  envelope  detected  filter 
responses.  The  weighting  parameters  take  account  of  the  time- 
frequency  "smear"  that  is  introduced  by  the  scattering  function 
of  the  target  and  the  propagation  medium.  If  a target  echo  con- 
sists of  a sequence  of  separate  highlights,  for  example,  then  the 
receiver  attempts  to  add  the  envelope  detected  matched  filter  res- 
ponses from  the  various  highlights  in  order  to  improve  detection 
performance. 

For  clutter  rejection,  the  formulation  in  (40)  is  still  appli- 
cable, i.  e.  , the  weights  s_  should  be  adjusted  so  that  the  spectro- 
gram correlator  response  to  the  target  is  large  relative  to  the 
clutter  response. 

2.4.8  Conclusion 

A spectrogram  correlator  is  a locally  optimum  detector  for 
signals  that  have  been  passed  through  a random,  time-varying 
channel.  The  standard  (Karkounen-Lo^ve)  detector  for  such  sig- 
nals forms  a weighted  sum  of  the  envelope  detected  outputs  of  a 
customized  set  of  filters  that  are  determined  from  a channel  des- 
cription by  means  of  an  eigenfunction  equation.  The  spectrogram 
correlator  is  easier  to  implement  because  it  uses  standard  FFT 
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processing  and  there  is  no  need  to  solve  an  eigenfunction  equation. 
The  spectrogram  correlator  can  also  be  used  to  estimate  changes 
in  the  channel,  and  it  can  easily  be  updated  to  account  for  these 
changes. 

The  standard  K-L  detector  tends  to  concentrate  signal  en- 
ergy into  relatively  few  sample  points,  and  this  energy  concentra- 
tion appears  to  be  advantageous  under  low  SNR  conditions.  The  use 
of  a periodic  signal  (constant  PRF  in  sonar)  appears  to  have  the 
same  concentration  effect  upon  the  spectrogram  detector,  and  in 
fact  leads  to  a K-L  detector  implementation  that  is  the  same  as  a 
spectrogram  correlator.  For  interference  rejection,  a spectro- 
gram representation  with  more  degrees  of  freedom  may  be  pre- 
ferable. 

Both  the  standard  K-L  detector  and  the  spectrogram  cor- 
relator can  be  adapted  to  non-Gaussian  noise  environments  by 
changing  the  power  law  of  the  envelope  detection  operation.  ^ 

The  use  of  a spectrogram  correlator  leads  to  an  especially 
clear  intuitive  picture  of  signal  and  receiver  requirements  for  the 
elimination  of  cross-talk  and  for  clutter  rejection.  In  addition  to 
adjusting  the  signal,  one  can  adjust  the  time  window  or  filter  func- 
tion that  is  used  to  construct  the  spectrogram.  The  reference 
samples  js„J  can  also  he  adjusted  to  increase  the  output  signal- 
to-clutter  ratio. 

The  response  of  a bank-of-matched-filters  sonar  receiver 
can  be  viewed  as  a special  kind  of  spectrogram.  This  interpreta- 
tion provides  insight  into  the  evolution  of  a receiver  when  the  de- 
tection problem  changes  from  (1)  a known  signal  in  Gaussian  noise 


t 


i 

II 
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t:>  (2)  a signal  with  unknown  parameters  in  non-Gaussian  noise  to 
(3)  a random  signal  in  non-Gaussian  noise.  The  transition  to  a 
signal  with  unknown  parameters  leads  to  a bank  of  matched  filters 
instead  of  just  one,  and  the  possibility  of  non-Gaussian  noise  re- 
sults in  an  adjustable  power-law  device  at  the  output  of  each  filter. 
Finally,  the  transition  to  a random  signal  results  in  a detector 
that  computes  a weighted  sum  of  sampled,  envelope  detected  filter 
responses.  In  this  case,  the  channel  is  modelled  as  an  array  of 
point  scatterers  with  different  ranges  and  velocities. 
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2.4.10  Appendix  to  Section  2. 4 


This  appendix  provides  a proof  of  Equation  (3),  which  redds 

[ [ E[S  (t,f)]  e‘j2'T(<it"Tf,dtdf  = 1|)  (t  .*)*  (-T.0).  (All 

J J uv  uu  vv 
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where 


CD 

t|)  (T»<M  = f E[u*(t-  t/2)  u(t  +t/2)  ] exp  (-i2  irbt)  dt  (A2) 
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t|)  ( — t,<M  = f v*(t  + T/2)  v(t-T/2)  exp(-j2not)  dt.  (A3) 

w J 

-CD 

Eq.  (Al)  defines  a two-dimensional  Fourier  transform  operation, 
and  the  inverse  transform  is 


f [$  (t.'M  ll>  (-t,$)  e^‘'?r(0t-Tf)dTdO 

J J uu  vv 

-ffjf  e‘j27rTfE[u*(x-T/2)  u(x  + t/2)]  v*(y+T/2)  v(Y-t/2) 
|exp{ -j2^<f>[y-(t-x)] } db^dydxdT 


142 


I ■ *-: 


jfjfjfe  ^27rTfE[u-(x-r/2)  u(x+t/2)  v*(y+T/2)  v(y-  t/2) 
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(A4)  becomes 

Jfe  ^ fff^xi  1^  Etu^tr^utXj)]  v*(t-T  ) v(t-xJ)  dXjdTj 

= E[  I f u(Xj)  v(t-Xj/  e"^2ff£xl  dxj  | 2 
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where  S^t,  f)  is  defined  in  (2).  Eq.  (Al)  follows  from  (A4) 


T/2)  dx  dT. 
(A4) 

(A5) 


(A6) 

- (A6). 


2.  5 Summary  and  Conclusions  for  Volume  2 

From  an  engineering  point  of  view,  the  most  important 
result  in  this  volume  is  that  a spectrogram  representation  of  a 
random  signal  contains  the  same  information  as  the  signal 
covariance  function.  A detector  that  uses  a spectrogram  represent- 
ation is  therefore  equivalent  to  one  that  is  based  upon  a description 
of  the  signal  covariance  function.  The  spectrogram  detector,  in 
fact,  appears  to  be  more  robust  in  situations  where  (1)  the  signal 
covariance  matrix  is  not  known  exactly,  and  (2)  the  background 
noise  is  not  necessarily  Gaussian. 

From  a biological  modelling  viewpoint,  a spectrogram 
correlation  process  seems  to  explain  human  frequency  discrimina- 
tion data  as  well  as  the  performance  of  humans  in  detecting 
sinusoids  with  harmonics. 

Given  the  above  observations,  it  is  not  surprising  that 
spectrogram  correlation,  combined  with  principal  component 
analysis,  has  out-performed  some  other  processors  (Volume  1). 

If  mismatched  filtering  for  maximization  of  signal-to-interference 
ratio  had  been  applied,  the  clutter  performance  in  Volume  1 could 
probably  have  been  improved  even  further! 
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